{
 "metadata": {
  "name": "",
  "signature": "sha256:0e119d09c9a6aa7c07a2d8793325942daa8c14bd2cbca47bc247be4d926c625b"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "MLlib: Basic Statistics and Exploratory Data Analysis"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[Introduction to Spark with Python, by Jose A. Dianes](https://github.com/jadianes/spark-py-notebooks)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So far we have used different map and aggregation functions, on simple and key/value pair RDD's, in order to get simple statistics that help us understand our datasets. In this notebook we will introduce Spark's machine learning library [MLlib](https://spark.apache.org/docs/latest/mllib-guide.html) through its basic statistics functionality in order to better understand our dataset. We will use the reduced 10-percent [KDD Cup 1999](http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html) datasets through the notebook.   "
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Getting the data and creating the RDD"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As we did in our first notebook, we will use the reduced dataset (10 percent) provided for the [KDD Cup 1999](http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html), containing nearly half million nework interactions. The file is provided as a Gzip file that we will download locally.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import urllib\n",
      "f = urllib.urlretrieve (\"http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz\", \"kddcup.data_10_percent.gz\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "data_file = \"./kddcup.data_10_percent.gz\"\n",
      "raw_data = sc.textFile(data_file)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Local vectors"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A [local vector](https://spark.apache.org/docs/latest/mllib-data-types.html#local-vector) is often used as a base type for RDDs in Spark MLlib. A local vector has integer-typed and 0-based indices and double-typed values, stored on a single machine. MLlib supports two types of local vectors: dense and sparse. A dense vector is backed by a double array representing its entry values, while a sparse vector is backed by two parallel arrays: indices and values. "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For dense vectors, MLlib uses either Python *lists* or the *NumPy* `array` type. The later is recommended, so you can simply pass NumPy arrays around.  "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For sparse vectors, users can construct a `SparseVector` object from MLlib or pass *SciPy* `scipy.sparse` column vectors if SciPy is available in their environment. The easiest way to create sparse vectors is to use the factory methods implemented in `Vectors`.  "
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "An RDD of dense vectors"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's represent each network interaction in our dataset as a dense vector. For that we will use the *NumPy* `array` type.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "\n",
      "def parse_interaction(line):\n",
      "    line_split = line.split(\",\")\n",
      "    # keep just numeric and logical values\n",
      "    symbolic_indexes = [1,2,3,41]\n",
      "    clean_line_split = [item for i,item in enumerate(line_split) if i not in symbolic_indexes]\n",
      "    return np.array([float(x) for x in clean_line_split])\n",
      "\n",
      "vector_data = raw_data.map(parse_interaction)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Summary statistics"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Spark's MLlib provides column summary statistics for `RDD[Vector]` through the function [`colStats`](https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics.colStats) available in [`Statistics`](https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics). The method returns an instance of [`MultivariateStatisticalSummary`](https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.stat.MultivariateStatisticalSummary), which contains the column-wise *max*, *min*, *mean*, *variance*, and *number of nonzeros*, as well as the *total count*.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from pyspark.mllib.stat import Statistics \n",
      "from math import sqrt \n",
      "\n",
      "# Compute column summary statistics.\n",
      "summary = Statistics.colStats(vector_data)\n",
      "\n",
      "print \"Duration Statistics:\"\n",
      "print \" Mean: {}\".format(round(summary.mean()[0],3))\n",
      "print \" St. deviation: {}\".format(round(sqrt(summary.variance()[0]),3))\n",
      "print \" Max value: {}\".format(round(summary.max()[0],3))\n",
      "print \" Min value: {}\".format(round(summary.min()[0],3))\n",
      "print \" Total value count: {}\".format(summary.count())\n",
      "print \" Number of non-zero values: {}\".format(summary.numNonzeros()[0])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Duration Statistics:\n",
        " Mean: 47.979\n",
        " St. deviation: 707.746\n",
        " Max value: 58329.0\n",
        " Min value: 0.0\n",
        " Total value count: 494021\n",
        " Number of non-zero values: 12350.0\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Summary statistics by label  "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The interesting part of summary statistics, in our case, comes from being able to obtain them by the type of network attack or 'label' in our dataset. By doing so we will be able to better characterise our dataset dependent variable in terms of the independent variables range of values.  "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If we want to do such a thing we could filter our RDD containing labels as keys and vectors as values. For that we just need to adapt our `parse_interaction` function to return a tuple with both elements.     "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def parse_interaction_with_key(line):\n",
      "    line_split = line.split(\",\")\n",
      "    # keep just numeric and logical values\n",
      "    symbolic_indexes = [1,2,3,41]\n",
      "    clean_line_split = [item for i,item in enumerate(line_split) if i not in symbolic_indexes]\n",
      "    return (line_split[41], np.array([float(x) for x in clean_line_split]))\n",
      "\n",
      "label_vector_data = raw_data.map(parse_interaction_with_key)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The next step is not very sofisticated. We use `filter` on the RDD to leave out other labels but the one we want to gather statistics from.    "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "normal_label_data = label_vector_data.filter(lambda x: x[0]==\"normal.\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we can use the new RDD to call `colStats` on the values.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "normal_summary = Statistics.colStats(normal_label_data.values())"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And collect the results as we did before.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print \"Duration Statistics for label: {}\".format(\"normal\")\n",
      "print \" Mean: {}\".format(normal_summary.mean()[0],3)\n",
      "print \" St. deviation: {}\".format(round(sqrt(normal_summary.variance()[0]),3))\n",
      "print \" Max value: {}\".format(round(normal_summary.max()[0],3))\n",
      "print \" Min value: {}\".format(round(normal_summary.min()[0],3))\n",
      "print \" Total value count: {}\".format(normal_summary.count())\n",
      "print \" Number of non-zero values: {}\".format(normal_summary.numNonzeros()[0])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Duration Statistics for label: normal\n",
        " Mean: 216.657322313\n",
        " St. deviation: 1359.213\n",
        " Max value: 58329.0\n",
        " Min value: 0.0\n",
        " Total value count: 97278\n",
        " Number of non-zero values: 11690.0\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Instead of working with a key/value pair we could have just filter our raw data split using the label in column 41. Then we can parse the results as we did before. This will work as well. However having our data organised as key/value pairs will open the door to better manipulations. Since `values()` is a transformation on an RDD, and not an action, we don't perform any computation until we call `colStats` anyway.  "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "But lets wrap this within a function so we can reuse it with any label."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def summary_by_label(raw_data, label):\n",
      "    label_vector_data = raw_data.map(parse_interaction_with_key).filter(lambda x: x[0]==label)\n",
      "    return Statistics.colStats(label_vector_data.values())"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's give it a try with the \"normal.\" label again.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "normal_sum = summary_by_label(raw_data, \"normal.\")\n",
      "\n",
      "print \"Duration Statistics for label: {}\".format(\"normal\")\n",
      "print \" Mean: {}\".format(normal_sum.mean()[0],3)\n",
      "print \" St. deviation: {}\".format(round(sqrt(normal_sum.variance()[0]),3))\n",
      "print \" Max value: {}\".format(round(normal_sum.max()[0],3))\n",
      "print \" Min value: {}\".format(round(normal_sum.min()[0],3))\n",
      "print \" Total value count: {}\".format(normal_sum.count())\n",
      "print \" Number of non-zero values: {}\".format(normal_sum.numNonzeros()[0])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Duration Statistics for label: normal\n",
        " Mean: 216.657322313\n",
        " St. deviation: 1359.213\n",
        " Max value: 58329.0\n",
        " Min value: 0.0\n",
        " Total value count: 97278\n",
        " Number of non-zero values: 11690.0\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's try now with some network attack. We have all of them listed [here](http://kdd.ics.uci.edu/databases/kddcup99/training_attack_types).  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "guess_passwd_summary = summary_by_label(raw_data, \"guess_passwd.\")\n",
      "\n",
      "print \"Duration Statistics for label: {}\".format(\"guess_password\")\n",
      "print \" Mean: {}\".format(guess_passwd_summary.mean()[0],3)\n",
      "print \" St. deviation: {}\".format(round(sqrt(guess_passwd_summary.variance()[0]),3))\n",
      "print \" Max value: {}\".format(round(guess_passwd_summary.max()[0],3))\n",
      "print \" Min value: {}\".format(round(guess_passwd_summary.min()[0],3))\n",
      "print \" Total value count: {}\".format(guess_passwd_summary.count())\n",
      "print \" Number of non-zero values: {}\".format(guess_passwd_summary.numNonzeros()[0])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Duration Statistics for label: guess_password\n",
        " Mean: 2.71698113208\n",
        " St. deviation: 11.88\n",
        " Max value: 60.0\n",
        " Min value: 0.0\n",
        " Total value count: 53\n",
        " Number of non-zero values: 4.0\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can see that this type of attack is shorter in duration than a normal interaction. We could build a table with duration statistics for each type of interaction in our dataset. First we need to get a list of labels as described in the first line [here](http://kdd.ics.uci.edu/databases/kddcup99/kddcup.names).      "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "label_list = [\"back.\",\"buffer_overflow.\",\"ftp_write.\",\"guess_passwd.\",\n",
      "              \"imap.\",\"ipsweep.\",\"land.\",\"loadmodule.\",\"multihop.\",\n",
      "              \"neptune.\",\"nmap.\",\"normal.\",\"perl.\",\"phf.\",\"pod.\",\"portsweep.\",\n",
      "              \"rootkit.\",\"satan.\",\"smurf.\",\"spy.\",\"teardrop.\",\"warezclient.\",\n",
      "              \"warezmaster.\"]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Then we get a list of statistics for each label.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "stats_by_label = [(label, summary_by_label(raw_data, label)) for label in label_list]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we get the *duration* column, first in our dataset (i.e. index 0).  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "duration_by_label = [ \n",
      "    (stat[0], np.array([float(stat[1].mean()[0]), float(sqrt(stat[1].variance()[0])), float(stat[1].min()[0]), float(stat[1].max()[0]), int(stat[1].count())])) \n",
      "    for stat in stats_by_label]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "That we can put into a Pandas data frame.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pandas as pd\n",
      "pd.set_option('display.max_columns', 50)\n",
      "\n",
      "stats_by_label_df = pd.DataFrame.from_items(duration_by_label, columns=[\"Mean\", \"Std Dev\", \"Min\", \"Max\", \"Count\"], orient='index')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And print it."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print \"Duration statistics, by label\"\n",
      "stats_by_label_df"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Duration statistics, by label\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>Mean</th>\n",
        "      <th>Std Dev</th>\n",
        "      <th>Min</th>\n",
        "      <th>Max</th>\n",
        "      <th>Count</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>back.</th>\n",
        "      <td>    0.128915</td>\n",
        "      <td>    1.110062</td>\n",
        "      <td>   0</td>\n",
        "      <td>    14</td>\n",
        "      <td>   2203</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>buffer_overflow.</th>\n",
        "      <td>   91.700000</td>\n",
        "      <td>   97.514685</td>\n",
        "      <td>   0</td>\n",
        "      <td>   321</td>\n",
        "      <td>     30</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>ftp_write.</th>\n",
        "      <td>   32.375000</td>\n",
        "      <td>   47.449033</td>\n",
        "      <td>   0</td>\n",
        "      <td>   134</td>\n",
        "      <td>      8</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>guess_passwd.</th>\n",
        "      <td>    2.716981</td>\n",
        "      <td>   11.879811</td>\n",
        "      <td>   0</td>\n",
        "      <td>    60</td>\n",
        "      <td>     53</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>imap.</th>\n",
        "      <td>    6.000000</td>\n",
        "      <td>   14.174240</td>\n",
        "      <td>   0</td>\n",
        "      <td>    41</td>\n",
        "      <td>     12</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>ipsweep.</th>\n",
        "      <td>    0.034483</td>\n",
        "      <td>    0.438439</td>\n",
        "      <td>   0</td>\n",
        "      <td>     7</td>\n",
        "      <td>   1247</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>land.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td>     21</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>loadmodule.</th>\n",
        "      <td>   36.222222</td>\n",
        "      <td>   41.408869</td>\n",
        "      <td>   0</td>\n",
        "      <td>   103</td>\n",
        "      <td>      9</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>multihop.</th>\n",
        "      <td>  184.000000</td>\n",
        "      <td>  253.851006</td>\n",
        "      <td>   0</td>\n",
        "      <td>   718</td>\n",
        "      <td>      7</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>neptune.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td> 107201</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>nmap.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td>    231</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>normal.</th>\n",
        "      <td>  216.657322</td>\n",
        "      <td> 1359.213469</td>\n",
        "      <td>   0</td>\n",
        "      <td> 58329</td>\n",
        "      <td>  97278</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>perl.</th>\n",
        "      <td>   41.333333</td>\n",
        "      <td>   14.843629</td>\n",
        "      <td>  25</td>\n",
        "      <td>    54</td>\n",
        "      <td>      3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>phf.</th>\n",
        "      <td>    4.500000</td>\n",
        "      <td>    5.744563</td>\n",
        "      <td>   0</td>\n",
        "      <td>    12</td>\n",
        "      <td>      4</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>pod.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td>    264</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>portsweep.</th>\n",
        "      <td> 1915.299038</td>\n",
        "      <td> 7285.125159</td>\n",
        "      <td>   0</td>\n",
        "      <td> 42448</td>\n",
        "      <td>   1040</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>rootkit.</th>\n",
        "      <td>  100.800000</td>\n",
        "      <td>  216.185003</td>\n",
        "      <td>   0</td>\n",
        "      <td>   708</td>\n",
        "      <td>     10</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>satan.</th>\n",
        "      <td>    0.040277</td>\n",
        "      <td>    0.522433</td>\n",
        "      <td>   0</td>\n",
        "      <td>    11</td>\n",
        "      <td>   1589</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>smurf.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td> 280790</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>spy.</th>\n",
        "      <td>  318.000000</td>\n",
        "      <td>   26.870058</td>\n",
        "      <td> 299</td>\n",
        "      <td>   337</td>\n",
        "      <td>      2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>teardrop.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td>    979</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>warezclient.</th>\n",
        "      <td>  615.257843</td>\n",
        "      <td> 2207.694966</td>\n",
        "      <td>   0</td>\n",
        "      <td> 15168</td>\n",
        "      <td>   1020</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>warezmaster.</th>\n",
        "      <td>   15.050000</td>\n",
        "      <td>   33.385271</td>\n",
        "      <td>   0</td>\n",
        "      <td>   156</td>\n",
        "      <td>     20</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 16,
       "text": [
        "                         Mean      Std Dev  Min    Max   Count\n",
        "back.                0.128915     1.110062    0     14    2203\n",
        "buffer_overflow.    91.700000    97.514685    0    321      30\n",
        "ftp_write.          32.375000    47.449033    0    134       8\n",
        "guess_passwd.        2.716981    11.879811    0     60      53\n",
        "imap.                6.000000    14.174240    0     41      12\n",
        "ipsweep.             0.034483     0.438439    0      7    1247\n",
        "land.                0.000000     0.000000    0      0      21\n",
        "loadmodule.         36.222222    41.408869    0    103       9\n",
        "multihop.          184.000000   253.851006    0    718       7\n",
        "neptune.             0.000000     0.000000    0      0  107201\n",
        "nmap.                0.000000     0.000000    0      0     231\n",
        "normal.            216.657322  1359.213469    0  58329   97278\n",
        "perl.               41.333333    14.843629   25     54       3\n",
        "phf.                 4.500000     5.744563    0     12       4\n",
        "pod.                 0.000000     0.000000    0      0     264\n",
        "portsweep.        1915.299038  7285.125159    0  42448    1040\n",
        "rootkit.           100.800000   216.185003    0    708      10\n",
        "satan.               0.040277     0.522433    0     11    1589\n",
        "smurf.               0.000000     0.000000    0      0  280790\n",
        "spy.               318.000000    26.870058  299    337       2\n",
        "teardrop.            0.000000     0.000000    0      0     979\n",
        "warezclient.       615.257843  2207.694966    0  15168    1020\n",
        "warezmaster.        15.050000    33.385271    0    156      20"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In order to reuse this code and get a dataframe from any variable in our dataset we will define a function.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def get_variable_stats_df(stats_by_label, column_i):\n",
      "    column_stats_by_label = [\n",
      "        (stat[0], np.array([float(stat[1].mean()[column_i]), float(sqrt(stat[1].variance()[column_i])), float(stat[1].min()[column_i]), float(stat[1].max()[column_i]), int(stat[1].count())])) \n",
      "        for stat in stats_by_label\n",
      "    ]\n",
      "    return pd.DataFrame.from_items(column_stats_by_label, columns=[\"Mean\", \"Std Dev\", \"Min\", \"Max\", \"Count\"], orient='index')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's try for *duration* again.   "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "get_variable_stats_df(stats_by_label,0)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>Mean</th>\n",
        "      <th>Std Dev</th>\n",
        "      <th>Min</th>\n",
        "      <th>Max</th>\n",
        "      <th>Count</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>back.</th>\n",
        "      <td>    0.128915</td>\n",
        "      <td>    1.110062</td>\n",
        "      <td>   0</td>\n",
        "      <td>    14</td>\n",
        "      <td>   2203</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>buffer_overflow.</th>\n",
        "      <td>   91.700000</td>\n",
        "      <td>   97.514685</td>\n",
        "      <td>   0</td>\n",
        "      <td>   321</td>\n",
        "      <td>     30</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>ftp_write.</th>\n",
        "      <td>   32.375000</td>\n",
        "      <td>   47.449033</td>\n",
        "      <td>   0</td>\n",
        "      <td>   134</td>\n",
        "      <td>      8</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>guess_passwd.</th>\n",
        "      <td>    2.716981</td>\n",
        "      <td>   11.879811</td>\n",
        "      <td>   0</td>\n",
        "      <td>    60</td>\n",
        "      <td>     53</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>imap.</th>\n",
        "      <td>    6.000000</td>\n",
        "      <td>   14.174240</td>\n",
        "      <td>   0</td>\n",
        "      <td>    41</td>\n",
        "      <td>     12</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>ipsweep.</th>\n",
        "      <td>    0.034483</td>\n",
        "      <td>    0.438439</td>\n",
        "      <td>   0</td>\n",
        "      <td>     7</td>\n",
        "      <td>   1247</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>land.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td>     21</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>loadmodule.</th>\n",
        "      <td>   36.222222</td>\n",
        "      <td>   41.408869</td>\n",
        "      <td>   0</td>\n",
        "      <td>   103</td>\n",
        "      <td>      9</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>multihop.</th>\n",
        "      <td>  184.000000</td>\n",
        "      <td>  253.851006</td>\n",
        "      <td>   0</td>\n",
        "      <td>   718</td>\n",
        "      <td>      7</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>neptune.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td> 107201</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>nmap.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td>    231</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>normal.</th>\n",
        "      <td>  216.657322</td>\n",
        "      <td> 1359.213469</td>\n",
        "      <td>   0</td>\n",
        "      <td> 58329</td>\n",
        "      <td>  97278</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>perl.</th>\n",
        "      <td>   41.333333</td>\n",
        "      <td>   14.843629</td>\n",
        "      <td>  25</td>\n",
        "      <td>    54</td>\n",
        "      <td>      3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>phf.</th>\n",
        "      <td>    4.500000</td>\n",
        "      <td>    5.744563</td>\n",
        "      <td>   0</td>\n",
        "      <td>    12</td>\n",
        "      <td>      4</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>pod.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td>    264</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>portsweep.</th>\n",
        "      <td> 1915.299038</td>\n",
        "      <td> 7285.125159</td>\n",
        "      <td>   0</td>\n",
        "      <td> 42448</td>\n",
        "      <td>   1040</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>rootkit.</th>\n",
        "      <td>  100.800000</td>\n",
        "      <td>  216.185003</td>\n",
        "      <td>   0</td>\n",
        "      <td>   708</td>\n",
        "      <td>     10</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>satan.</th>\n",
        "      <td>    0.040277</td>\n",
        "      <td>    0.522433</td>\n",
        "      <td>   0</td>\n",
        "      <td>    11</td>\n",
        "      <td>   1589</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>smurf.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td> 280790</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>spy.</th>\n",
        "      <td>  318.000000</td>\n",
        "      <td>   26.870058</td>\n",
        "      <td> 299</td>\n",
        "      <td>   337</td>\n",
        "      <td>      2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>teardrop.</th>\n",
        "      <td>    0.000000</td>\n",
        "      <td>    0.000000</td>\n",
        "      <td>   0</td>\n",
        "      <td>     0</td>\n",
        "      <td>    979</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>warezclient.</th>\n",
        "      <td>  615.257843</td>\n",
        "      <td> 2207.694966</td>\n",
        "      <td>   0</td>\n",
        "      <td> 15168</td>\n",
        "      <td>   1020</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>warezmaster.</th>\n",
        "      <td>   15.050000</td>\n",
        "      <td>   33.385271</td>\n",
        "      <td>   0</td>\n",
        "      <td>   156</td>\n",
        "      <td>     20</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 18,
       "text": [
        "                         Mean      Std Dev  Min    Max   Count\n",
        "back.                0.128915     1.110062    0     14    2203\n",
        "buffer_overflow.    91.700000    97.514685    0    321      30\n",
        "ftp_write.          32.375000    47.449033    0    134       8\n",
        "guess_passwd.        2.716981    11.879811    0     60      53\n",
        "imap.                6.000000    14.174240    0     41      12\n",
        "ipsweep.             0.034483     0.438439    0      7    1247\n",
        "land.                0.000000     0.000000    0      0      21\n",
        "loadmodule.         36.222222    41.408869    0    103       9\n",
        "multihop.          184.000000   253.851006    0    718       7\n",
        "neptune.             0.000000     0.000000    0      0  107201\n",
        "nmap.                0.000000     0.000000    0      0     231\n",
        "normal.            216.657322  1359.213469    0  58329   97278\n",
        "perl.               41.333333    14.843629   25     54       3\n",
        "phf.                 4.500000     5.744563    0     12       4\n",
        "pod.                 0.000000     0.000000    0      0     264\n",
        "portsweep.        1915.299038  7285.125159    0  42448    1040\n",
        "rootkit.           100.800000   216.185003    0    708      10\n",
        "satan.               0.040277     0.522433    0     11    1589\n",
        "smurf.               0.000000     0.000000    0      0  280790\n",
        "spy.               318.000000    26.870058  299    337       2\n",
        "teardrop.            0.000000     0.000000    0      0     979\n",
        "warezclient.       615.257843  2207.694966    0  15168    1020\n",
        "warezmaster.        15.050000    33.385271    0    156      20"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now for the next numeric column in the dataset, *src_bytes*.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print \"src_bytes statistics, by label\"\n",
      "get_variable_stats_df(stats_by_label,1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "src_bytes statistics, by label\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>Mean</th>\n",
        "      <th>Std Dev</th>\n",
        "      <th>Min</th>\n",
        "      <th>Max</th>\n",
        "      <th>Count</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>back.</th>\n",
        "      <td>  54156.355878</td>\n",
        "      <td>     3159.360232</td>\n",
        "      <td> 13140</td>\n",
        "      <td>     54540</td>\n",
        "      <td>   2203</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>buffer_overflow.</th>\n",
        "      <td>   1400.433333</td>\n",
        "      <td>     1337.132616</td>\n",
        "      <td>     0</td>\n",
        "      <td>      6274</td>\n",
        "      <td>     30</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>ftp_write.</th>\n",
        "      <td>    220.750000</td>\n",
        "      <td>      267.747616</td>\n",
        "      <td>     0</td>\n",
        "      <td>       676</td>\n",
        "      <td>      8</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>guess_passwd.</th>\n",
        "      <td>    125.339623</td>\n",
        "      <td>        3.037860</td>\n",
        "      <td>   104</td>\n",
        "      <td>       126</td>\n",
        "      <td>     53</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>imap.</th>\n",
        "      <td>    347.583333</td>\n",
        "      <td>      629.926036</td>\n",
        "      <td>     0</td>\n",
        "      <td>      1492</td>\n",
        "      <td>     12</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>ipsweep.</th>\n",
        "      <td>     10.083400</td>\n",
        "      <td>        5.231658</td>\n",
        "      <td>     0</td>\n",
        "      <td>        18</td>\n",
        "      <td>   1247</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>land.</th>\n",
        "      <td>      0.000000</td>\n",
        "      <td>        0.000000</td>\n",
        "      <td>     0</td>\n",
        "      <td>         0</td>\n",
        "      <td>     21</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>loadmodule.</th>\n",
        "      <td>    151.888889</td>\n",
        "      <td>      127.745298</td>\n",
        "      <td>     0</td>\n",
        "      <td>       302</td>\n",
        "      <td>      9</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>multihop.</th>\n",
        "      <td>    435.142857</td>\n",
        "      <td>      540.960389</td>\n",
        "      <td>     0</td>\n",
        "      <td>      1412</td>\n",
        "      <td>      7</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>neptune.</th>\n",
        "      <td>      0.000000</td>\n",
        "      <td>        0.000000</td>\n",
        "      <td>     0</td>\n",
        "      <td>         0</td>\n",
        "      <td> 107201</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>nmap.</th>\n",
        "      <td>     24.116883</td>\n",
        "      <td>       59.419871</td>\n",
        "      <td>     0</td>\n",
        "      <td>       207</td>\n",
        "      <td>    231</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>normal.</th>\n",
        "      <td>   1157.047524</td>\n",
        "      <td>    34226.124718</td>\n",
        "      <td>     0</td>\n",
        "      <td>   2194619</td>\n",
        "      <td>  97278</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>perl.</th>\n",
        "      <td>    265.666667</td>\n",
        "      <td>        4.932883</td>\n",
        "      <td>   260</td>\n",
        "      <td>       269</td>\n",
        "      <td>      3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>phf.</th>\n",
        "      <td>     51.000000</td>\n",
        "      <td>        0.000000</td>\n",
        "      <td>    51</td>\n",
        "      <td>        51</td>\n",
        "      <td>      4</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>pod.</th>\n",
        "      <td>   1462.651515</td>\n",
        "      <td>      125.098044</td>\n",
        "      <td>   564</td>\n",
        "      <td>      1480</td>\n",
        "      <td>    264</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>portsweep.</th>\n",
        "      <td> 666707.436538</td>\n",
        "      <td> 21500665.866700</td>\n",
        "      <td>     0</td>\n",
        "      <td> 693375640</td>\n",
        "      <td>   1040</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>rootkit.</th>\n",
        "      <td>    294.700000</td>\n",
        "      <td>      538.578180</td>\n",
        "      <td>     0</td>\n",
        "      <td>      1727</td>\n",
        "      <td>     10</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>satan.</th>\n",
        "      <td>      1.337319</td>\n",
        "      <td>       42.946200</td>\n",
        "      <td>     0</td>\n",
        "      <td>      1710</td>\n",
        "      <td>   1589</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>smurf.</th>\n",
        "      <td>    935.772300</td>\n",
        "      <td>      200.022386</td>\n",
        "      <td>   520</td>\n",
        "      <td>      1032</td>\n",
        "      <td> 280790</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>spy.</th>\n",
        "      <td>    174.500000</td>\n",
        "      <td>       88.388348</td>\n",
        "      <td>   112</td>\n",
        "      <td>       237</td>\n",
        "      <td>      2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>teardrop.</th>\n",
        "      <td>     28.000000</td>\n",
        "      <td>        0.000000</td>\n",
        "      <td>    28</td>\n",
        "      <td>        28</td>\n",
        "      <td>    979</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>warezclient.</th>\n",
        "      <td> 300219.562745</td>\n",
        "      <td>  1200905.243130</td>\n",
        "      <td>    30</td>\n",
        "      <td>   5135678</td>\n",
        "      <td>   1020</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>warezmaster.</th>\n",
        "      <td>     49.300000</td>\n",
        "      <td>      212.155132</td>\n",
        "      <td>     0</td>\n",
        "      <td>       950</td>\n",
        "      <td>     20</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 19,
       "text": [
        "                           Mean          Std Dev    Min        Max   Count\n",
        "back.              54156.355878      3159.360232  13140      54540    2203\n",
        "buffer_overflow.    1400.433333      1337.132616      0       6274      30\n",
        "ftp_write.           220.750000       267.747616      0        676       8\n",
        "guess_passwd.        125.339623         3.037860    104        126      53\n",
        "imap.                347.583333       629.926036      0       1492      12\n",
        "ipsweep.              10.083400         5.231658      0         18    1247\n",
        "land.                  0.000000         0.000000      0          0      21\n",
        "loadmodule.          151.888889       127.745298      0        302       9\n",
        "multihop.            435.142857       540.960389      0       1412       7\n",
        "neptune.               0.000000         0.000000      0          0  107201\n",
        "nmap.                 24.116883        59.419871      0        207     231\n",
        "normal.             1157.047524     34226.124718      0    2194619   97278\n",
        "perl.                265.666667         4.932883    260        269       3\n",
        "phf.                  51.000000         0.000000     51         51       4\n",
        "pod.                1462.651515       125.098044    564       1480     264\n",
        "portsweep.        666707.436538  21500665.866700      0  693375640    1040\n",
        "rootkit.             294.700000       538.578180      0       1727      10\n",
        "satan.                 1.337319        42.946200      0       1710    1589\n",
        "smurf.               935.772300       200.022386    520       1032  280790\n",
        "spy.                 174.500000        88.388348    112        237       2\n",
        "teardrop.             28.000000         0.000000     28         28     979\n",
        "warezclient.      300219.562745   1200905.243130     30    5135678    1020\n",
        "warezmaster.          49.300000       212.155132      0        950      20"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And so on. By reusing the `summary_by_label` and `get_variable_stats_df` functions we can perform some exploratory data analysis in large datasets with Spark.  "
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Correlations"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Spark's MLlib supports [Pearson\u2019s](http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient) and [Spearman\u2019s](http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient) to calculate pairwise correlation methods among many series. Both of them are provided by the `corr` method in the `Statistics` package.  "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We have two options as input. Either two `RDD[Double]`s or an `RDD[Vector]`. In the first case the output will be a `Double` value, while in the second a whole correlation Matrix. Due to the nature of our data, we will obtain the second.    "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from pyspark.mllib.stat import Statistics \n",
      "correlation_matrix = Statistics.corr(vector_data, method=\"spearman\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 20
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Once we have the correlations ready, we can start inspecting their values.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pandas as pd\n",
      "pd.set_option('display.max_columns', 50)\n",
      "\n",
      "col_names = [\"duration\",\"src_bytes\",\"dst_bytes\",\"land\",\"wrong_fragment\",\n",
      "             \"urgent\",\"hot\",\"num_failed_logins\",\"logged_in\",\"num_compromised\",\n",
      "             \"root_shell\",\"su_attempted\",\"num_root\",\"num_file_creations\",\n",
      "             \"num_shells\",\"num_access_files\",\"num_outbound_cmds\",\n",
      "             \"is_hot_login\",\"is_guest_login\",\"count\",\"srv_count\",\"serror_rate\",\n",
      "             \"srv_serror_rate\",\"rerror_rate\",\"srv_rerror_rate\",\"same_srv_rate\",\n",
      "             \"diff_srv_rate\",\"srv_diff_host_rate\",\"dst_host_count\",\"dst_host_srv_count\",\n",
      "             \"dst_host_same_srv_rate\",\"dst_host_diff_srv_rate\",\"dst_host_same_src_port_rate\",\n",
      "             \"dst_host_srv_diff_host_rate\",\"dst_host_serror_rate\",\"dst_host_srv_serror_rate\",\n",
      "             \"dst_host_rerror_rate\",\"dst_host_srv_rerror_rate\"]\n",
      "\n",
      "corr_df = pd.DataFrame(correlation_matrix, index=col_names, columns=col_names)\n",
      "\n",
      "corr_df"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>duration</th>\n",
        "      <th>src_bytes</th>\n",
        "      <th>dst_bytes</th>\n",
        "      <th>land</th>\n",
        "      <th>wrong_fragment</th>\n",
        "      <th>urgent</th>\n",
        "      <th>hot</th>\n",
        "      <th>num_failed_logins</th>\n",
        "      <th>logged_in</th>\n",
        "      <th>num_compromised</th>\n",
        "      <th>root_shell</th>\n",
        "      <th>su_attempted</th>\n",
        "      <th>num_root</th>\n",
        "      <th>num_file_creations</th>\n",
        "      <th>num_shells</th>\n",
        "      <th>num_access_files</th>\n",
        "      <th>num_outbound_cmds</th>\n",
        "      <th>is_hot_login</th>\n",
        "      <th>is_guest_login</th>\n",
        "      <th>count</th>\n",
        "      <th>srv_count</th>\n",
        "      <th>serror_rate</th>\n",
        "      <th>srv_serror_rate</th>\n",
        "      <th>rerror_rate</th>\n",
        "      <th>srv_rerror_rate</th>\n",
        "      <th>same_srv_rate</th>\n",
        "      <th>diff_srv_rate</th>\n",
        "      <th>srv_diff_host_rate</th>\n",
        "      <th>dst_host_count</th>\n",
        "      <th>dst_host_srv_count</th>\n",
        "      <th>dst_host_same_srv_rate</th>\n",
        "      <th>dst_host_diff_srv_rate</th>\n",
        "      <th>dst_host_same_src_port_rate</th>\n",
        "      <th>dst_host_srv_diff_host_rate</th>\n",
        "      <th>dst_host_serror_rate</th>\n",
        "      <th>dst_host_srv_serror_rate</th>\n",
        "      <th>dst_host_rerror_rate</th>\n",
        "      <th>dst_host_srv_rerror_rate</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>duration</th>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.014196</td>\n",
        "      <td> 0.299189</td>\n",
        "      <td>-0.001068</td>\n",
        "      <td>-0.008025</td>\n",
        "      <td> 0.017883</td>\n",
        "      <td> 0.108639</td>\n",
        "      <td> 0.014363</td>\n",
        "      <td> 0.159564</td>\n",
        "      <td> 0.010687</td>\n",
        "      <td> 0.040425</td>\n",
        "      <td> 0.026015</td>\n",
        "      <td> 0.013401</td>\n",
        "      <td> 0.061099</td>\n",
        "      <td> 0.008632</td>\n",
        "      <td> 0.019407</td>\n",
        "      <td>-0.000019</td>\n",
        "      <td>-0.000010</td>\n",
        "      <td> 0.205606</td>\n",
        "      <td>-0.259032</td>\n",
        "      <td>-0.250139</td>\n",
        "      <td>-0.074211</td>\n",
        "      <td>-0.073663</td>\n",
        "      <td>-0.025936</td>\n",
        "      <td>-0.026420</td>\n",
        "      <td> 0.062291</td>\n",
        "      <td>-0.050875</td>\n",
        "      <td> 0.123621</td>\n",
        "      <td>-0.161107</td>\n",
        "      <td>-0.217167</td>\n",
        "      <td>-0.211979</td>\n",
        "      <td> 0.231644</td>\n",
        "      <td>-0.065202</td>\n",
        "      <td> 0.100692</td>\n",
        "      <td>-0.056753</td>\n",
        "      <td>-0.057298</td>\n",
        "      <td>-0.007759</td>\n",
        "      <td>-0.013891</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>src_bytes</th>\n",
        "      <td> 0.014196</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.167931</td>\n",
        "      <td>-0.009404</td>\n",
        "      <td>-0.019358</td>\n",
        "      <td> 0.000094</td>\n",
        "      <td> 0.113920</td>\n",
        "      <td>-0.008396</td>\n",
        "      <td>-0.089702</td>\n",
        "      <td> 0.118562</td>\n",
        "      <td> 0.003067</td>\n",
        "      <td> 0.002282</td>\n",
        "      <td>-0.002050</td>\n",
        "      <td> 0.027710</td>\n",
        "      <td> 0.014403</td>\n",
        "      <td>-0.001497</td>\n",
        "      <td> 0.000010</td>\n",
        "      <td> 0.000019</td>\n",
        "      <td> 0.027511</td>\n",
        "      <td> 0.666230</td>\n",
        "      <td> 0.722609</td>\n",
        "      <td>-0.657460</td>\n",
        "      <td>-0.652391</td>\n",
        "      <td>-0.342180</td>\n",
        "      <td>-0.332977</td>\n",
        "      <td> 0.744046</td>\n",
        "      <td>-0.739988</td>\n",
        "      <td>-0.104042</td>\n",
        "      <td> 0.130377</td>\n",
        "      <td> 0.741979</td>\n",
        "      <td> 0.729151</td>\n",
        "      <td>-0.712965</td>\n",
        "      <td> 0.815039</td>\n",
        "      <td>-0.140231</td>\n",
        "      <td>-0.645920</td>\n",
        "      <td>-0.641792</td>\n",
        "      <td>-0.297338</td>\n",
        "      <td>-0.300581</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_bytes</th>\n",
        "      <td> 0.299189</td>\n",
        "      <td>-0.167931</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.003040</td>\n",
        "      <td>-0.022659</td>\n",
        "      <td> 0.007234</td>\n",
        "      <td> 0.193156</td>\n",
        "      <td> 0.021952</td>\n",
        "      <td> 0.882185</td>\n",
        "      <td> 0.169772</td>\n",
        "      <td> 0.026054</td>\n",
        "      <td> 0.012192</td>\n",
        "      <td>-0.003884</td>\n",
        "      <td> 0.034154</td>\n",
        "      <td>-0.000054</td>\n",
        "      <td> 0.065776</td>\n",
        "      <td>-0.000031</td>\n",
        "      <td> 0.000041</td>\n",
        "      <td> 0.085947</td>\n",
        "      <td>-0.639157</td>\n",
        "      <td>-0.497683</td>\n",
        "      <td>-0.205848</td>\n",
        "      <td>-0.198715</td>\n",
        "      <td>-0.100958</td>\n",
        "      <td>-0.081307</td>\n",
        "      <td> 0.229677</td>\n",
        "      <td>-0.222572</td>\n",
        "      <td> 0.521003</td>\n",
        "      <td>-0.611972</td>\n",
        "      <td> 0.024124</td>\n",
        "      <td> 0.055033</td>\n",
        "      <td>-0.035073</td>\n",
        "      <td>-0.396195</td>\n",
        "      <td> 0.578557</td>\n",
        "      <td>-0.167047</td>\n",
        "      <td>-0.158378</td>\n",
        "      <td>-0.003042</td>\n",
        "      <td> 0.001621</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>land</th>\n",
        "      <td>-0.001068</td>\n",
        "      <td>-0.009404</td>\n",
        "      <td>-0.003040</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.000333</td>\n",
        "      <td>-0.000065</td>\n",
        "      <td>-0.000539</td>\n",
        "      <td>-0.000076</td>\n",
        "      <td>-0.002785</td>\n",
        "      <td>-0.000447</td>\n",
        "      <td>-0.000093</td>\n",
        "      <td>-0.000049</td>\n",
        "      <td>-0.000230</td>\n",
        "      <td>-0.000150</td>\n",
        "      <td>-0.000076</td>\n",
        "      <td>-0.000211</td>\n",
        "      <td>-0.002881</td>\n",
        "      <td> 0.002089</td>\n",
        "      <td>-0.000250</td>\n",
        "      <td>-0.010939</td>\n",
        "      <td>-0.010128</td>\n",
        "      <td> 0.014160</td>\n",
        "      <td> 0.014342</td>\n",
        "      <td>-0.000451</td>\n",
        "      <td>-0.001690</td>\n",
        "      <td> 0.002153</td>\n",
        "      <td>-0.001846</td>\n",
        "      <td> 0.020678</td>\n",
        "      <td>-0.019923</td>\n",
        "      <td>-0.012341</td>\n",
        "      <td> 0.002576</td>\n",
        "      <td>-0.001803</td>\n",
        "      <td> 0.004265</td>\n",
        "      <td> 0.016171</td>\n",
        "      <td> 0.013566</td>\n",
        "      <td> 0.012265</td>\n",
        "      <td> 0.000389</td>\n",
        "      <td>-0.001816</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>wrong_fragment</th>\n",
        "      <td>-0.008025</td>\n",
        "      <td>-0.019358</td>\n",
        "      <td>-0.022659</td>\n",
        "      <td>-0.000333</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.000150</td>\n",
        "      <td>-0.004042</td>\n",
        "      <td>-0.000568</td>\n",
        "      <td>-0.020911</td>\n",
        "      <td>-0.003370</td>\n",
        "      <td>-0.000528</td>\n",
        "      <td>-0.000248</td>\n",
        "      <td>-0.001727</td>\n",
        "      <td>-0.001160</td>\n",
        "      <td>-0.000507</td>\n",
        "      <td>-0.001519</td>\n",
        "      <td>-0.000147</td>\n",
        "      <td> 0.000441</td>\n",
        "      <td>-0.001869</td>\n",
        "      <td>-0.057711</td>\n",
        "      <td>-0.029117</td>\n",
        "      <td>-0.008849</td>\n",
        "      <td>-0.023382</td>\n",
        "      <td> 0.000430</td>\n",
        "      <td>-0.012676</td>\n",
        "      <td> 0.010218</td>\n",
        "      <td>-0.009386</td>\n",
        "      <td> 0.012117</td>\n",
        "      <td>-0.029149</td>\n",
        "      <td>-0.058225</td>\n",
        "      <td>-0.049560</td>\n",
        "      <td> 0.055542</td>\n",
        "      <td>-0.015449</td>\n",
        "      <td> 0.007306</td>\n",
        "      <td> 0.010387</td>\n",
        "      <td>-0.024117</td>\n",
        "      <td> 0.046656</td>\n",
        "      <td>-0.013666</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>urgent</th>\n",
        "      <td> 0.017883</td>\n",
        "      <td> 0.000094</td>\n",
        "      <td> 0.007234</td>\n",
        "      <td>-0.000065</td>\n",
        "      <td>-0.000150</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.008594</td>\n",
        "      <td> 0.063009</td>\n",
        "      <td> 0.006821</td>\n",
        "      <td> 0.031765</td>\n",
        "      <td> 0.067437</td>\n",
        "      <td> 0.000020</td>\n",
        "      <td> 0.061994</td>\n",
        "      <td> 0.061383</td>\n",
        "      <td>-0.000066</td>\n",
        "      <td> 0.023380</td>\n",
        "      <td> 0.012879</td>\n",
        "      <td> 0.005162</td>\n",
        "      <td>-0.000100</td>\n",
        "      <td>-0.004778</td>\n",
        "      <td>-0.004799</td>\n",
        "      <td>-0.001338</td>\n",
        "      <td>-0.001327</td>\n",
        "      <td>-0.000705</td>\n",
        "      <td>-0.000726</td>\n",
        "      <td> 0.001521</td>\n",
        "      <td>-0.001522</td>\n",
        "      <td>-0.000788</td>\n",
        "      <td>-0.005894</td>\n",
        "      <td>-0.005698</td>\n",
        "      <td>-0.004078</td>\n",
        "      <td> 0.005208</td>\n",
        "      <td>-0.001939</td>\n",
        "      <td>-0.000976</td>\n",
        "      <td>-0.001381</td>\n",
        "      <td>-0.001370</td>\n",
        "      <td>-0.000786</td>\n",
        "      <td>-0.000782</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>hot</th>\n",
        "      <td> 0.108639</td>\n",
        "      <td> 0.113920</td>\n",
        "      <td> 0.193156</td>\n",
        "      <td>-0.000539</td>\n",
        "      <td>-0.004042</td>\n",
        "      <td> 0.008594</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.112560</td>\n",
        "      <td> 0.189126</td>\n",
        "      <td> 0.811529</td>\n",
        "      <td> 0.101983</td>\n",
        "      <td>-0.000400</td>\n",
        "      <td> 0.003096</td>\n",
        "      <td> 0.028694</td>\n",
        "      <td> 0.009146</td>\n",
        "      <td> 0.004224</td>\n",
        "      <td>-0.000393</td>\n",
        "      <td>-0.000248</td>\n",
        "      <td> 0.463706</td>\n",
        "      <td>-0.120847</td>\n",
        "      <td>-0.114735</td>\n",
        "      <td>-0.035487</td>\n",
        "      <td>-0.034934</td>\n",
        "      <td> 0.013468</td>\n",
        "      <td> 0.052003</td>\n",
        "      <td> 0.041342</td>\n",
        "      <td>-0.040555</td>\n",
        "      <td> 0.032141</td>\n",
        "      <td>-0.074178</td>\n",
        "      <td>-0.017960</td>\n",
        "      <td> 0.018783</td>\n",
        "      <td>-0.017198</td>\n",
        "      <td>-0.086998</td>\n",
        "      <td>-0.014141</td>\n",
        "      <td>-0.004706</td>\n",
        "      <td>-0.010721</td>\n",
        "      <td> 0.199019</td>\n",
        "      <td> 0.189142</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>num_failed_logins</th>\n",
        "      <td> 0.014363</td>\n",
        "      <td>-0.008396</td>\n",
        "      <td> 0.021952</td>\n",
        "      <td>-0.000076</td>\n",
        "      <td>-0.000568</td>\n",
        "      <td> 0.063009</td>\n",
        "      <td> 0.112560</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.002190</td>\n",
        "      <td> 0.004619</td>\n",
        "      <td> 0.016895</td>\n",
        "      <td> 0.072748</td>\n",
        "      <td> 0.010060</td>\n",
        "      <td> 0.015211</td>\n",
        "      <td>-0.000093</td>\n",
        "      <td> 0.005581</td>\n",
        "      <td> 0.003431</td>\n",
        "      <td>-0.001560</td>\n",
        "      <td>-0.000428</td>\n",
        "      <td>-0.018024</td>\n",
        "      <td>-0.018027</td>\n",
        "      <td>-0.003674</td>\n",
        "      <td>-0.004027</td>\n",
        "      <td> 0.035324</td>\n",
        "      <td> 0.034876</td>\n",
        "      <td> 0.005716</td>\n",
        "      <td>-0.005538</td>\n",
        "      <td>-0.003096</td>\n",
        "      <td>-0.028369</td>\n",
        "      <td>-0.015092</td>\n",
        "      <td> 0.003004</td>\n",
        "      <td>-0.002960</td>\n",
        "      <td>-0.006617</td>\n",
        "      <td>-0.002588</td>\n",
        "      <td> 0.014713</td>\n",
        "      <td> 0.014914</td>\n",
        "      <td> 0.032395</td>\n",
        "      <td> 0.032151</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>logged_in</th>\n",
        "      <td> 0.159564</td>\n",
        "      <td>-0.089702</td>\n",
        "      <td> 0.882185</td>\n",
        "      <td>-0.002785</td>\n",
        "      <td>-0.020911</td>\n",
        "      <td> 0.006821</td>\n",
        "      <td> 0.189126</td>\n",
        "      <td>-0.002190</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.161190</td>\n",
        "      <td> 0.025293</td>\n",
        "      <td> 0.011813</td>\n",
        "      <td> 0.082533</td>\n",
        "      <td> 0.055530</td>\n",
        "      <td> 0.024354</td>\n",
        "      <td> 0.072698</td>\n",
        "      <td> 0.000079</td>\n",
        "      <td> 0.000127</td>\n",
        "      <td> 0.089318</td>\n",
        "      <td>-0.578287</td>\n",
        "      <td>-0.438947</td>\n",
        "      <td>-0.187114</td>\n",
        "      <td>-0.180122</td>\n",
        "      <td>-0.091962</td>\n",
        "      <td>-0.072287</td>\n",
        "      <td> 0.216969</td>\n",
        "      <td>-0.214019</td>\n",
        "      <td> 0.503807</td>\n",
        "      <td>-0.682721</td>\n",
        "      <td> 0.080352</td>\n",
        "      <td> 0.114526</td>\n",
        "      <td>-0.093565</td>\n",
        "      <td>-0.359506</td>\n",
        "      <td> 0.659078</td>\n",
        "      <td>-0.143283</td>\n",
        "      <td>-0.132474</td>\n",
        "      <td> 0.007236</td>\n",
        "      <td> 0.012979</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>num_compromised</th>\n",
        "      <td> 0.010687</td>\n",
        "      <td> 0.118562</td>\n",
        "      <td> 0.169772</td>\n",
        "      <td>-0.000447</td>\n",
        "      <td>-0.003370</td>\n",
        "      <td> 0.031765</td>\n",
        "      <td> 0.811529</td>\n",
        "      <td> 0.004619</td>\n",
        "      <td> 0.161190</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.085558</td>\n",
        "      <td> 0.048985</td>\n",
        "      <td> 0.028557</td>\n",
        "      <td> 0.031223</td>\n",
        "      <td> 0.011256</td>\n",
        "      <td> 0.006977</td>\n",
        "      <td> 0.001048</td>\n",
        "      <td>-0.000438</td>\n",
        "      <td>-0.002504</td>\n",
        "      <td>-0.097212</td>\n",
        "      <td>-0.091154</td>\n",
        "      <td>-0.030516</td>\n",
        "      <td>-0.030264</td>\n",
        "      <td> 0.008573</td>\n",
        "      <td> 0.054006</td>\n",
        "      <td> 0.035253</td>\n",
        "      <td>-0.034953</td>\n",
        "      <td> 0.036497</td>\n",
        "      <td>-0.041615</td>\n",
        "      <td> 0.003465</td>\n",
        "      <td> 0.038980</td>\n",
        "      <td>-0.039091</td>\n",
        "      <td>-0.078843</td>\n",
        "      <td>-0.020979</td>\n",
        "      <td>-0.005019</td>\n",
        "      <td>-0.004504</td>\n",
        "      <td> 0.214115</td>\n",
        "      <td> 0.217858</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>root_shell</th>\n",
        "      <td> 0.040425</td>\n",
        "      <td> 0.003067</td>\n",
        "      <td> 0.026054</td>\n",
        "      <td>-0.000093</td>\n",
        "      <td>-0.000528</td>\n",
        "      <td> 0.067437</td>\n",
        "      <td> 0.101983</td>\n",
        "      <td> 0.016895</td>\n",
        "      <td> 0.025293</td>\n",
        "      <td> 0.085558</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.233486</td>\n",
        "      <td> 0.094512</td>\n",
        "      <td> 0.140650</td>\n",
        "      <td> 0.132056</td>\n",
        "      <td> 0.069353</td>\n",
        "      <td> 0.011462</td>\n",
        "      <td>-0.006602</td>\n",
        "      <td>-0.000405</td>\n",
        "      <td>-0.016409</td>\n",
        "      <td>-0.015174</td>\n",
        "      <td>-0.004952</td>\n",
        "      <td>-0.004923</td>\n",
        "      <td>-0.001104</td>\n",
        "      <td>-0.001143</td>\n",
        "      <td> 0.004946</td>\n",
        "      <td>-0.004553</td>\n",
        "      <td> 0.002286</td>\n",
        "      <td>-0.021367</td>\n",
        "      <td>-0.011906</td>\n",
        "      <td> 0.000515</td>\n",
        "      <td>-0.000916</td>\n",
        "      <td>-0.004617</td>\n",
        "      <td> 0.008631</td>\n",
        "      <td>-0.003498</td>\n",
        "      <td>-0.003032</td>\n",
        "      <td> 0.002763</td>\n",
        "      <td> 0.002151</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>su_attempted</th>\n",
        "      <td> 0.026015</td>\n",
        "      <td> 0.002282</td>\n",
        "      <td> 0.012192</td>\n",
        "      <td>-0.000049</td>\n",
        "      <td>-0.000248</td>\n",
        "      <td> 0.000020</td>\n",
        "      <td>-0.000400</td>\n",
        "      <td> 0.072748</td>\n",
        "      <td> 0.011813</td>\n",
        "      <td> 0.048985</td>\n",
        "      <td> 0.233486</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.119326</td>\n",
        "      <td> 0.053110</td>\n",
        "      <td> 0.040487</td>\n",
        "      <td> 0.081272</td>\n",
        "      <td>-0.018896</td>\n",
        "      <td> 0.012927</td>\n",
        "      <td>-0.000219</td>\n",
        "      <td>-0.008279</td>\n",
        "      <td>-0.008225</td>\n",
        "      <td>-0.002318</td>\n",
        "      <td>-0.002295</td>\n",
        "      <td>-0.001227</td>\n",
        "      <td>-0.001253</td>\n",
        "      <td> 0.002634</td>\n",
        "      <td>-0.002649</td>\n",
        "      <td> 0.000348</td>\n",
        "      <td>-0.006697</td>\n",
        "      <td>-0.006288</td>\n",
        "      <td>-0.005738</td>\n",
        "      <td> 0.006687</td>\n",
        "      <td>-0.005020</td>\n",
        "      <td> 0.001052</td>\n",
        "      <td> 0.001974</td>\n",
        "      <td> 0.002893</td>\n",
        "      <td> 0.003173</td>\n",
        "      <td> 0.001731</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>num_root</th>\n",
        "      <td> 0.013401</td>\n",
        "      <td>-0.002050</td>\n",
        "      <td>-0.003884</td>\n",
        "      <td>-0.000230</td>\n",
        "      <td>-0.001727</td>\n",
        "      <td> 0.061994</td>\n",
        "      <td> 0.003096</td>\n",
        "      <td> 0.010060</td>\n",
        "      <td> 0.082533</td>\n",
        "      <td> 0.028557</td>\n",
        "      <td> 0.094512</td>\n",
        "      <td> 0.119326</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.047521</td>\n",
        "      <td> 0.034405</td>\n",
        "      <td> 0.014513</td>\n",
        "      <td> 0.001524</td>\n",
        "      <td>-0.002585</td>\n",
        "      <td>-0.001281</td>\n",
        "      <td>-0.054721</td>\n",
        "      <td>-0.053530</td>\n",
        "      <td>-0.016031</td>\n",
        "      <td>-0.015936</td>\n",
        "      <td>-0.008610</td>\n",
        "      <td>-0.008708</td>\n",
        "      <td> 0.013881</td>\n",
        "      <td>-0.011337</td>\n",
        "      <td> 0.006316</td>\n",
        "      <td>-0.078717</td>\n",
        "      <td>-0.038689</td>\n",
        "      <td>-0.038935</td>\n",
        "      <td> 0.047414</td>\n",
        "      <td>-0.015968</td>\n",
        "      <td> 0.061030</td>\n",
        "      <td>-0.008457</td>\n",
        "      <td>-0.007096</td>\n",
        "      <td>-0.000421</td>\n",
        "      <td>-0.005012</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>num_file_creations</th>\n",
        "      <td> 0.061099</td>\n",
        "      <td> 0.027710</td>\n",
        "      <td> 0.034154</td>\n",
        "      <td>-0.000150</td>\n",
        "      <td>-0.001160</td>\n",
        "      <td> 0.061383</td>\n",
        "      <td> 0.028694</td>\n",
        "      <td> 0.015211</td>\n",
        "      <td> 0.055530</td>\n",
        "      <td> 0.031223</td>\n",
        "      <td> 0.140650</td>\n",
        "      <td> 0.053110</td>\n",
        "      <td> 0.047521</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.068660</td>\n",
        "      <td> 0.031042</td>\n",
        "      <td>-0.004081</td>\n",
        "      <td>-0.001664</td>\n",
        "      <td> 0.013242</td>\n",
        "      <td>-0.036467</td>\n",
        "      <td>-0.034598</td>\n",
        "      <td>-0.009703</td>\n",
        "      <td>-0.010390</td>\n",
        "      <td>-0.005069</td>\n",
        "      <td>-0.004775</td>\n",
        "      <td> 0.009784</td>\n",
        "      <td>-0.008711</td>\n",
        "      <td> 0.014412</td>\n",
        "      <td>-0.049529</td>\n",
        "      <td>-0.026890</td>\n",
        "      <td>-0.021731</td>\n",
        "      <td> 0.027092</td>\n",
        "      <td>-0.015018</td>\n",
        "      <td> 0.030590</td>\n",
        "      <td>-0.002257</td>\n",
        "      <td>-0.004295</td>\n",
        "      <td> 0.000626</td>\n",
        "      <td>-0.001096</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>num_shells</th>\n",
        "      <td> 0.008632</td>\n",
        "      <td> 0.014403</td>\n",
        "      <td>-0.000054</td>\n",
        "      <td>-0.000076</td>\n",
        "      <td>-0.000507</td>\n",
        "      <td>-0.000066</td>\n",
        "      <td> 0.009146</td>\n",
        "      <td>-0.000093</td>\n",
        "      <td> 0.024354</td>\n",
        "      <td> 0.011256</td>\n",
        "      <td> 0.132056</td>\n",
        "      <td> 0.040487</td>\n",
        "      <td> 0.034405</td>\n",
        "      <td> 0.068660</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.019438</td>\n",
        "      <td>-0.002592</td>\n",
        "      <td>-0.006631</td>\n",
        "      <td>-0.000405</td>\n",
        "      <td>-0.013938</td>\n",
        "      <td>-0.011784</td>\n",
        "      <td>-0.004343</td>\n",
        "      <td>-0.004740</td>\n",
        "      <td>-0.002541</td>\n",
        "      <td>-0.002572</td>\n",
        "      <td> 0.004282</td>\n",
        "      <td>-0.003743</td>\n",
        "      <td> 0.001096</td>\n",
        "      <td>-0.021200</td>\n",
        "      <td>-0.012017</td>\n",
        "      <td>-0.009962</td>\n",
        "      <td> 0.010761</td>\n",
        "      <td>-0.003521</td>\n",
        "      <td> 0.015882</td>\n",
        "      <td>-0.001588</td>\n",
        "      <td>-0.002357</td>\n",
        "      <td>-0.000617</td>\n",
        "      <td>-0.002020</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>num_access_files</th>\n",
        "      <td> 0.019407</td>\n",
        "      <td>-0.001497</td>\n",
        "      <td> 0.065776</td>\n",
        "      <td>-0.000211</td>\n",
        "      <td>-0.001519</td>\n",
        "      <td> 0.023380</td>\n",
        "      <td> 0.004224</td>\n",
        "      <td> 0.005581</td>\n",
        "      <td> 0.072698</td>\n",
        "      <td> 0.006977</td>\n",
        "      <td> 0.069353</td>\n",
        "      <td> 0.081272</td>\n",
        "      <td> 0.014513</td>\n",
        "      <td> 0.031042</td>\n",
        "      <td> 0.019438</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.001597</td>\n",
        "      <td>-0.002850</td>\n",
        "      <td> 0.002466</td>\n",
        "      <td>-0.045282</td>\n",
        "      <td>-0.040497</td>\n",
        "      <td>-0.013945</td>\n",
        "      <td>-0.013572</td>\n",
        "      <td>-0.007581</td>\n",
        "      <td> 0.001874</td>\n",
        "      <td> 0.015499</td>\n",
        "      <td>-0.015112</td>\n",
        "      <td> 0.024266</td>\n",
        "      <td>-0.023865</td>\n",
        "      <td>-0.023657</td>\n",
        "      <td>-0.021358</td>\n",
        "      <td> 0.026703</td>\n",
        "      <td>-0.033288</td>\n",
        "      <td> 0.011765</td>\n",
        "      <td>-0.011197</td>\n",
        "      <td>-0.011487</td>\n",
        "      <td>-0.004743</td>\n",
        "      <td>-0.004552</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>num_outbound_cmds</th>\n",
        "      <td>-0.000019</td>\n",
        "      <td> 0.000010</td>\n",
        "      <td>-0.000031</td>\n",
        "      <td>-0.002881</td>\n",
        "      <td>-0.000147</td>\n",
        "      <td> 0.012879</td>\n",
        "      <td>-0.000393</td>\n",
        "      <td> 0.003431</td>\n",
        "      <td> 0.000079</td>\n",
        "      <td> 0.001048</td>\n",
        "      <td> 0.011462</td>\n",
        "      <td>-0.018896</td>\n",
        "      <td> 0.001524</td>\n",
        "      <td>-0.004081</td>\n",
        "      <td>-0.002592</td>\n",
        "      <td>-0.001597</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.822890</td>\n",
        "      <td> 0.000924</td>\n",
        "      <td>-0.000076</td>\n",
        "      <td> 0.000100</td>\n",
        "      <td> 0.000167</td>\n",
        "      <td> 0.000209</td>\n",
        "      <td> 0.000536</td>\n",
        "      <td> 0.000346</td>\n",
        "      <td> 0.000208</td>\n",
        "      <td> 0.000328</td>\n",
        "      <td>-0.000141</td>\n",
        "      <td>-0.000424</td>\n",
        "      <td>-0.000280</td>\n",
        "      <td>-0.000503</td>\n",
        "      <td>-0.000181</td>\n",
        "      <td>-0.000455</td>\n",
        "      <td> 0.000288</td>\n",
        "      <td>-0.000011</td>\n",
        "      <td>-0.000372</td>\n",
        "      <td>-0.000823</td>\n",
        "      <td>-0.001038</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>is_hot_login</th>\n",
        "      <td>-0.000010</td>\n",
        "      <td> 0.000019</td>\n",
        "      <td> 0.000041</td>\n",
        "      <td> 0.002089</td>\n",
        "      <td> 0.000441</td>\n",
        "      <td> 0.005162</td>\n",
        "      <td>-0.000248</td>\n",
        "      <td>-0.001560</td>\n",
        "      <td> 0.000127</td>\n",
        "      <td>-0.000438</td>\n",
        "      <td>-0.006602</td>\n",
        "      <td> 0.012927</td>\n",
        "      <td>-0.002585</td>\n",
        "      <td>-0.001664</td>\n",
        "      <td>-0.006631</td>\n",
        "      <td>-0.002850</td>\n",
        "      <td> 0.822890</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.001512</td>\n",
        "      <td> 0.000036</td>\n",
        "      <td> 0.000064</td>\n",
        "      <td> 0.000102</td>\n",
        "      <td>-0.000302</td>\n",
        "      <td>-0.000550</td>\n",
        "      <td> 0.000457</td>\n",
        "      <td>-0.000159</td>\n",
        "      <td>-0.000235</td>\n",
        "      <td>-0.000360</td>\n",
        "      <td>-0.000106</td>\n",
        "      <td> 0.000206</td>\n",
        "      <td> 0.000229</td>\n",
        "      <td>-0.000004</td>\n",
        "      <td> 0.000283</td>\n",
        "      <td> 0.000538</td>\n",
        "      <td>-0.000076</td>\n",
        "      <td>-0.000007</td>\n",
        "      <td>-0.000435</td>\n",
        "      <td>-0.000529</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>is_guest_login</th>\n",
        "      <td> 0.205606</td>\n",
        "      <td> 0.027511</td>\n",
        "      <td> 0.085947</td>\n",
        "      <td>-0.000250</td>\n",
        "      <td>-0.001869</td>\n",
        "      <td>-0.000100</td>\n",
        "      <td> 0.463706</td>\n",
        "      <td>-0.000428</td>\n",
        "      <td> 0.089318</td>\n",
        "      <td>-0.002504</td>\n",
        "      <td>-0.000405</td>\n",
        "      <td>-0.000219</td>\n",
        "      <td>-0.001281</td>\n",
        "      <td> 0.013242</td>\n",
        "      <td>-0.000405</td>\n",
        "      <td> 0.002466</td>\n",
        "      <td> 0.000924</td>\n",
        "      <td> 0.001512</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.062340</td>\n",
        "      <td>-0.062713</td>\n",
        "      <td>-0.017343</td>\n",
        "      <td>-0.017240</td>\n",
        "      <td>-0.008867</td>\n",
        "      <td>-0.009193</td>\n",
        "      <td> 0.018042</td>\n",
        "      <td>-0.017000</td>\n",
        "      <td>-0.008878</td>\n",
        "      <td>-0.055453</td>\n",
        "      <td>-0.044366</td>\n",
        "      <td>-0.041749</td>\n",
        "      <td> 0.044640</td>\n",
        "      <td>-0.038092</td>\n",
        "      <td>-0.012578</td>\n",
        "      <td>-0.001066</td>\n",
        "      <td>-0.016885</td>\n",
        "      <td> 0.025282</td>\n",
        "      <td>-0.004292</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>count</th>\n",
        "      <td>-0.259032</td>\n",
        "      <td> 0.666230</td>\n",
        "      <td>-0.639157</td>\n",
        "      <td>-0.010939</td>\n",
        "      <td>-0.057711</td>\n",
        "      <td>-0.004778</td>\n",
        "      <td>-0.120847</td>\n",
        "      <td>-0.018024</td>\n",
        "      <td>-0.578287</td>\n",
        "      <td>-0.097212</td>\n",
        "      <td>-0.016409</td>\n",
        "      <td>-0.008279</td>\n",
        "      <td>-0.054721</td>\n",
        "      <td>-0.036467</td>\n",
        "      <td>-0.013938</td>\n",
        "      <td>-0.045282</td>\n",
        "      <td>-0.000076</td>\n",
        "      <td> 0.000036</td>\n",
        "      <td>-0.062340</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.950587</td>\n",
        "      <td>-0.303538</td>\n",
        "      <td>-0.308923</td>\n",
        "      <td>-0.213824</td>\n",
        "      <td>-0.221352</td>\n",
        "      <td> 0.346718</td>\n",
        "      <td>-0.361737</td>\n",
        "      <td>-0.384010</td>\n",
        "      <td> 0.547443</td>\n",
        "      <td> 0.586979</td>\n",
        "      <td> 0.539698</td>\n",
        "      <td>-0.546869</td>\n",
        "      <td> 0.776906</td>\n",
        "      <td>-0.496554</td>\n",
        "      <td>-0.331571</td>\n",
        "      <td>-0.335290</td>\n",
        "      <td>-0.261194</td>\n",
        "      <td>-0.256176</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>srv_count</th>\n",
        "      <td>-0.250139</td>\n",
        "      <td> 0.722609</td>\n",
        "      <td>-0.497683</td>\n",
        "      <td>-0.010128</td>\n",
        "      <td>-0.029117</td>\n",
        "      <td>-0.004799</td>\n",
        "      <td>-0.114735</td>\n",
        "      <td>-0.018027</td>\n",
        "      <td>-0.438947</td>\n",
        "      <td>-0.091154</td>\n",
        "      <td>-0.015174</td>\n",
        "      <td>-0.008225</td>\n",
        "      <td>-0.053530</td>\n",
        "      <td>-0.034598</td>\n",
        "      <td>-0.011784</td>\n",
        "      <td>-0.040497</td>\n",
        "      <td> 0.000100</td>\n",
        "      <td> 0.000064</td>\n",
        "      <td>-0.062713</td>\n",
        "      <td> 0.950587</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.428185</td>\n",
        "      <td>-0.421424</td>\n",
        "      <td>-0.281468</td>\n",
        "      <td>-0.284034</td>\n",
        "      <td> 0.517227</td>\n",
        "      <td>-0.511998</td>\n",
        "      <td>-0.239057</td>\n",
        "      <td> 0.442611</td>\n",
        "      <td> 0.720746</td>\n",
        "      <td> 0.681955</td>\n",
        "      <td>-0.673916</td>\n",
        "      <td> 0.812280</td>\n",
        "      <td>-0.391712</td>\n",
        "      <td>-0.449096</td>\n",
        "      <td>-0.442823</td>\n",
        "      <td>-0.313442</td>\n",
        "      <td>-0.308132</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>serror_rate</th>\n",
        "      <td>-0.074211</td>\n",
        "      <td>-0.657460</td>\n",
        "      <td>-0.205848</td>\n",
        "      <td> 0.014160</td>\n",
        "      <td>-0.008849</td>\n",
        "      <td>-0.001338</td>\n",
        "      <td>-0.035487</td>\n",
        "      <td>-0.003674</td>\n",
        "      <td>-0.187114</td>\n",
        "      <td>-0.030516</td>\n",
        "      <td>-0.004952</td>\n",
        "      <td>-0.002318</td>\n",
        "      <td>-0.016031</td>\n",
        "      <td>-0.009703</td>\n",
        "      <td>-0.004343</td>\n",
        "      <td>-0.013945</td>\n",
        "      <td> 0.000167</td>\n",
        "      <td> 0.000102</td>\n",
        "      <td>-0.017343</td>\n",
        "      <td>-0.303538</td>\n",
        "      <td>-0.428185</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.990888</td>\n",
        "      <td>-0.091157</td>\n",
        "      <td>-0.095285</td>\n",
        "      <td>-0.851915</td>\n",
        "      <td> 0.828012</td>\n",
        "      <td>-0.121489</td>\n",
        "      <td> 0.165350</td>\n",
        "      <td>-0.724317</td>\n",
        "      <td>-0.745745</td>\n",
        "      <td> 0.719708</td>\n",
        "      <td>-0.650336</td>\n",
        "      <td>-0.153568</td>\n",
        "      <td> 0.973947</td>\n",
        "      <td> 0.965663</td>\n",
        "      <td>-0.103198</td>\n",
        "      <td>-0.105434</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>srv_serror_rate</th>\n",
        "      <td>-0.073663</td>\n",
        "      <td>-0.652391</td>\n",
        "      <td>-0.198715</td>\n",
        "      <td> 0.014342</td>\n",
        "      <td>-0.023382</td>\n",
        "      <td>-0.001327</td>\n",
        "      <td>-0.034934</td>\n",
        "      <td>-0.004027</td>\n",
        "      <td>-0.180122</td>\n",
        "      <td>-0.030264</td>\n",
        "      <td>-0.004923</td>\n",
        "      <td>-0.002295</td>\n",
        "      <td>-0.015936</td>\n",
        "      <td>-0.010390</td>\n",
        "      <td>-0.004740</td>\n",
        "      <td>-0.013572</td>\n",
        "      <td> 0.000209</td>\n",
        "      <td>-0.000302</td>\n",
        "      <td>-0.017240</td>\n",
        "      <td>-0.308923</td>\n",
        "      <td>-0.421424</td>\n",
        "      <td> 0.990888</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.110664</td>\n",
        "      <td>-0.115286</td>\n",
        "      <td>-0.839315</td>\n",
        "      <td> 0.815305</td>\n",
        "      <td>-0.112222</td>\n",
        "      <td> 0.160322</td>\n",
        "      <td>-0.713313</td>\n",
        "      <td>-0.734334</td>\n",
        "      <td> 0.707753</td>\n",
        "      <td>-0.646256</td>\n",
        "      <td>-0.148072</td>\n",
        "      <td> 0.967214</td>\n",
        "      <td> 0.970617</td>\n",
        "      <td>-0.122630</td>\n",
        "      <td>-0.124656</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>rerror_rate</th>\n",
        "      <td>-0.025936</td>\n",
        "      <td>-0.342180</td>\n",
        "      <td>-0.100958</td>\n",
        "      <td>-0.000451</td>\n",
        "      <td> 0.000430</td>\n",
        "      <td>-0.000705</td>\n",
        "      <td> 0.013468</td>\n",
        "      <td> 0.035324</td>\n",
        "      <td>-0.091962</td>\n",
        "      <td> 0.008573</td>\n",
        "      <td>-0.001104</td>\n",
        "      <td>-0.001227</td>\n",
        "      <td>-0.008610</td>\n",
        "      <td>-0.005069</td>\n",
        "      <td>-0.002541</td>\n",
        "      <td>-0.007581</td>\n",
        "      <td> 0.000536</td>\n",
        "      <td>-0.000550</td>\n",
        "      <td>-0.008867</td>\n",
        "      <td>-0.213824</td>\n",
        "      <td>-0.281468</td>\n",
        "      <td>-0.091157</td>\n",
        "      <td>-0.110664</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.978813</td>\n",
        "      <td>-0.327986</td>\n",
        "      <td> 0.345571</td>\n",
        "      <td>-0.017902</td>\n",
        "      <td>-0.067857</td>\n",
        "      <td>-0.330391</td>\n",
        "      <td>-0.303126</td>\n",
        "      <td> 0.308722</td>\n",
        "      <td>-0.278465</td>\n",
        "      <td> 0.073061</td>\n",
        "      <td>-0.094076</td>\n",
        "      <td>-0.110646</td>\n",
        "      <td> 0.910225</td>\n",
        "      <td> 0.911622</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>srv_rerror_rate</th>\n",
        "      <td>-0.026420</td>\n",
        "      <td>-0.332977</td>\n",
        "      <td>-0.081307</td>\n",
        "      <td>-0.001690</td>\n",
        "      <td>-0.012676</td>\n",
        "      <td>-0.000726</td>\n",
        "      <td> 0.052003</td>\n",
        "      <td> 0.034876</td>\n",
        "      <td>-0.072287</td>\n",
        "      <td> 0.054006</td>\n",
        "      <td>-0.001143</td>\n",
        "      <td>-0.001253</td>\n",
        "      <td>-0.008708</td>\n",
        "      <td>-0.004775</td>\n",
        "      <td>-0.002572</td>\n",
        "      <td> 0.001874</td>\n",
        "      <td> 0.000346</td>\n",
        "      <td> 0.000457</td>\n",
        "      <td>-0.009193</td>\n",
        "      <td>-0.221352</td>\n",
        "      <td>-0.284034</td>\n",
        "      <td>-0.095285</td>\n",
        "      <td>-0.115286</td>\n",
        "      <td> 0.978813</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.316568</td>\n",
        "      <td> 0.333439</td>\n",
        "      <td> 0.011285</td>\n",
        "      <td>-0.072595</td>\n",
        "      <td>-0.323032</td>\n",
        "      <td>-0.294328</td>\n",
        "      <td> 0.300186</td>\n",
        "      <td>-0.282239</td>\n",
        "      <td> 0.075178</td>\n",
        "      <td>-0.096146</td>\n",
        "      <td>-0.114341</td>\n",
        "      <td> 0.904591</td>\n",
        "      <td> 0.914904</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>same_srv_rate</th>\n",
        "      <td> 0.062291</td>\n",
        "      <td> 0.744046</td>\n",
        "      <td> 0.229677</td>\n",
        "      <td> 0.002153</td>\n",
        "      <td> 0.010218</td>\n",
        "      <td> 0.001521</td>\n",
        "      <td> 0.041342</td>\n",
        "      <td> 0.005716</td>\n",
        "      <td> 0.216969</td>\n",
        "      <td> 0.035253</td>\n",
        "      <td> 0.004946</td>\n",
        "      <td> 0.002634</td>\n",
        "      <td> 0.013881</td>\n",
        "      <td> 0.009784</td>\n",
        "      <td> 0.004282</td>\n",
        "      <td> 0.015499</td>\n",
        "      <td> 0.000208</td>\n",
        "      <td>-0.000159</td>\n",
        "      <td> 0.018042</td>\n",
        "      <td> 0.346718</td>\n",
        "      <td> 0.517227</td>\n",
        "      <td>-0.851915</td>\n",
        "      <td>-0.839315</td>\n",
        "      <td>-0.327986</td>\n",
        "      <td>-0.316568</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.982109</td>\n",
        "      <td> 0.140660</td>\n",
        "      <td>-0.190121</td>\n",
        "      <td> 0.848754</td>\n",
        "      <td> 0.873551</td>\n",
        "      <td>-0.844537</td>\n",
        "      <td> 0.732841</td>\n",
        "      <td> 0.179040</td>\n",
        "      <td>-0.830067</td>\n",
        "      <td>-0.819335</td>\n",
        "      <td>-0.282487</td>\n",
        "      <td>-0.282913</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>diff_srv_rate</th>\n",
        "      <td>-0.050875</td>\n",
        "      <td>-0.739988</td>\n",
        "      <td>-0.222572</td>\n",
        "      <td>-0.001846</td>\n",
        "      <td>-0.009386</td>\n",
        "      <td>-0.001522</td>\n",
        "      <td>-0.040555</td>\n",
        "      <td>-0.005538</td>\n",
        "      <td>-0.214019</td>\n",
        "      <td>-0.034953</td>\n",
        "      <td>-0.004553</td>\n",
        "      <td>-0.002649</td>\n",
        "      <td>-0.011337</td>\n",
        "      <td>-0.008711</td>\n",
        "      <td>-0.003743</td>\n",
        "      <td>-0.015112</td>\n",
        "      <td> 0.000328</td>\n",
        "      <td>-0.000235</td>\n",
        "      <td>-0.017000</td>\n",
        "      <td>-0.361737</td>\n",
        "      <td>-0.511998</td>\n",
        "      <td> 0.828012</td>\n",
        "      <td> 0.815305</td>\n",
        "      <td> 0.345571</td>\n",
        "      <td> 0.333439</td>\n",
        "      <td>-0.982109</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.138293</td>\n",
        "      <td> 0.185942</td>\n",
        "      <td>-0.844028</td>\n",
        "      <td>-0.868580</td>\n",
        "      <td> 0.850911</td>\n",
        "      <td>-0.727031</td>\n",
        "      <td>-0.176930</td>\n",
        "      <td> 0.807205</td>\n",
        "      <td> 0.795844</td>\n",
        "      <td> 0.299041</td>\n",
        "      <td> 0.298904</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>srv_diff_host_rate</th>\n",
        "      <td> 0.123621</td>\n",
        "      <td>-0.104042</td>\n",
        "      <td> 0.521003</td>\n",
        "      <td> 0.020678</td>\n",
        "      <td> 0.012117</td>\n",
        "      <td>-0.000788</td>\n",
        "      <td> 0.032141</td>\n",
        "      <td>-0.003096</td>\n",
        "      <td> 0.503807</td>\n",
        "      <td> 0.036497</td>\n",
        "      <td> 0.002286</td>\n",
        "      <td> 0.000348</td>\n",
        "      <td> 0.006316</td>\n",
        "      <td> 0.014412</td>\n",
        "      <td> 0.001096</td>\n",
        "      <td> 0.024266</td>\n",
        "      <td>-0.000141</td>\n",
        "      <td>-0.000360</td>\n",
        "      <td>-0.008878</td>\n",
        "      <td>-0.384010</td>\n",
        "      <td>-0.239057</td>\n",
        "      <td>-0.121489</td>\n",
        "      <td>-0.112222</td>\n",
        "      <td>-0.017902</td>\n",
        "      <td> 0.011285</td>\n",
        "      <td> 0.140660</td>\n",
        "      <td>-0.138293</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.445051</td>\n",
        "      <td> 0.035010</td>\n",
        "      <td> 0.068648</td>\n",
        "      <td>-0.050472</td>\n",
        "      <td>-0.222707</td>\n",
        "      <td> 0.433173</td>\n",
        "      <td>-0.097973</td>\n",
        "      <td>-0.092661</td>\n",
        "      <td> 0.022585</td>\n",
        "      <td> 0.024722</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_count</th>\n",
        "      <td>-0.161107</td>\n",
        "      <td> 0.130377</td>\n",
        "      <td>-0.611972</td>\n",
        "      <td>-0.019923</td>\n",
        "      <td>-0.029149</td>\n",
        "      <td>-0.005894</td>\n",
        "      <td>-0.074178</td>\n",
        "      <td>-0.028369</td>\n",
        "      <td>-0.682721</td>\n",
        "      <td>-0.041615</td>\n",
        "      <td>-0.021367</td>\n",
        "      <td>-0.006697</td>\n",
        "      <td>-0.078717</td>\n",
        "      <td>-0.049529</td>\n",
        "      <td>-0.021200</td>\n",
        "      <td>-0.023865</td>\n",
        "      <td>-0.000424</td>\n",
        "      <td>-0.000106</td>\n",
        "      <td>-0.055453</td>\n",
        "      <td> 0.547443</td>\n",
        "      <td> 0.442611</td>\n",
        "      <td> 0.165350</td>\n",
        "      <td> 0.160322</td>\n",
        "      <td>-0.067857</td>\n",
        "      <td>-0.072595</td>\n",
        "      <td>-0.190121</td>\n",
        "      <td> 0.185942</td>\n",
        "      <td>-0.445051</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.022731</td>\n",
        "      <td>-0.070448</td>\n",
        "      <td> 0.044338</td>\n",
        "      <td> 0.189876</td>\n",
        "      <td>-0.918894</td>\n",
        "      <td> 0.123881</td>\n",
        "      <td> 0.113845</td>\n",
        "      <td>-0.125142</td>\n",
        "      <td>-0.125273</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_srv_count</th>\n",
        "      <td>-0.217167</td>\n",
        "      <td> 0.741979</td>\n",
        "      <td> 0.024124</td>\n",
        "      <td>-0.012341</td>\n",
        "      <td>-0.058225</td>\n",
        "      <td>-0.005698</td>\n",
        "      <td>-0.017960</td>\n",
        "      <td>-0.015092</td>\n",
        "      <td> 0.080352</td>\n",
        "      <td> 0.003465</td>\n",
        "      <td>-0.011906</td>\n",
        "      <td>-0.006288</td>\n",
        "      <td>-0.038689</td>\n",
        "      <td>-0.026890</td>\n",
        "      <td>-0.012017</td>\n",
        "      <td>-0.023657</td>\n",
        "      <td>-0.000280</td>\n",
        "      <td> 0.000206</td>\n",
        "      <td>-0.044366</td>\n",
        "      <td> 0.586979</td>\n",
        "      <td> 0.720746</td>\n",
        "      <td>-0.724317</td>\n",
        "      <td>-0.713313</td>\n",
        "      <td>-0.330391</td>\n",
        "      <td>-0.323032</td>\n",
        "      <td> 0.848754</td>\n",
        "      <td>-0.844028</td>\n",
        "      <td> 0.035010</td>\n",
        "      <td> 0.022731</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.970072</td>\n",
        "      <td>-0.955178</td>\n",
        "      <td> 0.769481</td>\n",
        "      <td> 0.043668</td>\n",
        "      <td>-0.722607</td>\n",
        "      <td>-0.708392</td>\n",
        "      <td>-0.312040</td>\n",
        "      <td>-0.300787</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_same_srv_rate</th>\n",
        "      <td>-0.211979</td>\n",
        "      <td> 0.729151</td>\n",
        "      <td> 0.055033</td>\n",
        "      <td> 0.002576</td>\n",
        "      <td>-0.049560</td>\n",
        "      <td>-0.004078</td>\n",
        "      <td> 0.018783</td>\n",
        "      <td> 0.003004</td>\n",
        "      <td> 0.114526</td>\n",
        "      <td> 0.038980</td>\n",
        "      <td> 0.000515</td>\n",
        "      <td>-0.005738</td>\n",
        "      <td>-0.038935</td>\n",
        "      <td>-0.021731</td>\n",
        "      <td>-0.009962</td>\n",
        "      <td>-0.021358</td>\n",
        "      <td>-0.000503</td>\n",
        "      <td> 0.000229</td>\n",
        "      <td>-0.041749</td>\n",
        "      <td> 0.539698</td>\n",
        "      <td> 0.681955</td>\n",
        "      <td>-0.745745</td>\n",
        "      <td>-0.734334</td>\n",
        "      <td>-0.303126</td>\n",
        "      <td>-0.294328</td>\n",
        "      <td> 0.873551</td>\n",
        "      <td>-0.868580</td>\n",
        "      <td> 0.068648</td>\n",
        "      <td>-0.070448</td>\n",
        "      <td> 0.970072</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.980245</td>\n",
        "      <td> 0.771158</td>\n",
        "      <td> 0.107926</td>\n",
        "      <td>-0.742045</td>\n",
        "      <td>-0.725272</td>\n",
        "      <td>-0.278068</td>\n",
        "      <td>-0.264383</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_diff_srv_rate</th>\n",
        "      <td> 0.231644</td>\n",
        "      <td>-0.712965</td>\n",
        "      <td>-0.035073</td>\n",
        "      <td>-0.001803</td>\n",
        "      <td> 0.055542</td>\n",
        "      <td> 0.005208</td>\n",
        "      <td>-0.017198</td>\n",
        "      <td>-0.002960</td>\n",
        "      <td>-0.093565</td>\n",
        "      <td>-0.039091</td>\n",
        "      <td>-0.000916</td>\n",
        "      <td> 0.006687</td>\n",
        "      <td> 0.047414</td>\n",
        "      <td> 0.027092</td>\n",
        "      <td> 0.010761</td>\n",
        "      <td> 0.026703</td>\n",
        "      <td>-0.000181</td>\n",
        "      <td>-0.000004</td>\n",
        "      <td> 0.044640</td>\n",
        "      <td>-0.546869</td>\n",
        "      <td>-0.673916</td>\n",
        "      <td> 0.719708</td>\n",
        "      <td> 0.707753</td>\n",
        "      <td> 0.308722</td>\n",
        "      <td> 0.300186</td>\n",
        "      <td>-0.844537</td>\n",
        "      <td> 0.850911</td>\n",
        "      <td>-0.050472</td>\n",
        "      <td> 0.044338</td>\n",
        "      <td>-0.955178</td>\n",
        "      <td>-0.980245</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.766402</td>\n",
        "      <td>-0.088665</td>\n",
        "      <td> 0.719275</td>\n",
        "      <td> 0.701149</td>\n",
        "      <td> 0.287476</td>\n",
        "      <td> 0.271067</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_same_src_port_rate</th>\n",
        "      <td>-0.065202</td>\n",
        "      <td> 0.815039</td>\n",
        "      <td>-0.396195</td>\n",
        "      <td> 0.004265</td>\n",
        "      <td>-0.015449</td>\n",
        "      <td>-0.001939</td>\n",
        "      <td>-0.086998</td>\n",
        "      <td>-0.006617</td>\n",
        "      <td>-0.359506</td>\n",
        "      <td>-0.078843</td>\n",
        "      <td>-0.004617</td>\n",
        "      <td>-0.005020</td>\n",
        "      <td>-0.015968</td>\n",
        "      <td>-0.015018</td>\n",
        "      <td>-0.003521</td>\n",
        "      <td>-0.033288</td>\n",
        "      <td>-0.000455</td>\n",
        "      <td> 0.000283</td>\n",
        "      <td>-0.038092</td>\n",
        "      <td> 0.776906</td>\n",
        "      <td> 0.812280</td>\n",
        "      <td>-0.650336</td>\n",
        "      <td>-0.646256</td>\n",
        "      <td>-0.278465</td>\n",
        "      <td>-0.282239</td>\n",
        "      <td> 0.732841</td>\n",
        "      <td>-0.727031</td>\n",
        "      <td>-0.222707</td>\n",
        "      <td> 0.189876</td>\n",
        "      <td> 0.769481</td>\n",
        "      <td> 0.771158</td>\n",
        "      <td>-0.766402</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.175310</td>\n",
        "      <td>-0.658737</td>\n",
        "      <td>-0.652636</td>\n",
        "      <td>-0.299273</td>\n",
        "      <td>-0.297100</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_srv_diff_host_rate</th>\n",
        "      <td> 0.100692</td>\n",
        "      <td>-0.140231</td>\n",
        "      <td> 0.578557</td>\n",
        "      <td> 0.016171</td>\n",
        "      <td> 0.007306</td>\n",
        "      <td>-0.000976</td>\n",
        "      <td>-0.014141</td>\n",
        "      <td>-0.002588</td>\n",
        "      <td> 0.659078</td>\n",
        "      <td>-0.020979</td>\n",
        "      <td> 0.008631</td>\n",
        "      <td> 0.001052</td>\n",
        "      <td> 0.061030</td>\n",
        "      <td> 0.030590</td>\n",
        "      <td> 0.015882</td>\n",
        "      <td> 0.011765</td>\n",
        "      <td> 0.000288</td>\n",
        "      <td> 0.000538</td>\n",
        "      <td>-0.012578</td>\n",
        "      <td>-0.496554</td>\n",
        "      <td>-0.391712</td>\n",
        "      <td>-0.153568</td>\n",
        "      <td>-0.148072</td>\n",
        "      <td> 0.073061</td>\n",
        "      <td> 0.075178</td>\n",
        "      <td> 0.179040</td>\n",
        "      <td>-0.176930</td>\n",
        "      <td> 0.433173</td>\n",
        "      <td>-0.918894</td>\n",
        "      <td> 0.043668</td>\n",
        "      <td> 0.107926</td>\n",
        "      <td>-0.088665</td>\n",
        "      <td>-0.175310</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.118697</td>\n",
        "      <td>-0.103715</td>\n",
        "      <td> 0.114971</td>\n",
        "      <td> 0.120767</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_serror_rate</th>\n",
        "      <td>-0.056753</td>\n",
        "      <td>-0.645920</td>\n",
        "      <td>-0.167047</td>\n",
        "      <td> 0.013566</td>\n",
        "      <td> 0.010387</td>\n",
        "      <td>-0.001381</td>\n",
        "      <td>-0.004706</td>\n",
        "      <td> 0.014713</td>\n",
        "      <td>-0.143283</td>\n",
        "      <td>-0.005019</td>\n",
        "      <td>-0.003498</td>\n",
        "      <td> 0.001974</td>\n",
        "      <td>-0.008457</td>\n",
        "      <td>-0.002257</td>\n",
        "      <td>-0.001588</td>\n",
        "      <td>-0.011197</td>\n",
        "      <td>-0.000011</td>\n",
        "      <td>-0.000076</td>\n",
        "      <td>-0.001066</td>\n",
        "      <td>-0.331571</td>\n",
        "      <td>-0.449096</td>\n",
        "      <td> 0.973947</td>\n",
        "      <td> 0.967214</td>\n",
        "      <td>-0.094076</td>\n",
        "      <td>-0.096146</td>\n",
        "      <td>-0.830067</td>\n",
        "      <td> 0.807205</td>\n",
        "      <td>-0.097973</td>\n",
        "      <td> 0.123881</td>\n",
        "      <td>-0.722607</td>\n",
        "      <td>-0.742045</td>\n",
        "      <td> 0.719275</td>\n",
        "      <td>-0.658737</td>\n",
        "      <td>-0.118697</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.968015</td>\n",
        "      <td>-0.087531</td>\n",
        "      <td>-0.096899</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_srv_serror_rate</th>\n",
        "      <td>-0.057298</td>\n",
        "      <td>-0.641792</td>\n",
        "      <td>-0.158378</td>\n",
        "      <td> 0.012265</td>\n",
        "      <td>-0.024117</td>\n",
        "      <td>-0.001370</td>\n",
        "      <td>-0.010721</td>\n",
        "      <td> 0.014914</td>\n",
        "      <td>-0.132474</td>\n",
        "      <td>-0.004504</td>\n",
        "      <td>-0.003032</td>\n",
        "      <td> 0.002893</td>\n",
        "      <td>-0.007096</td>\n",
        "      <td>-0.004295</td>\n",
        "      <td>-0.002357</td>\n",
        "      <td>-0.011487</td>\n",
        "      <td>-0.000372</td>\n",
        "      <td>-0.000007</td>\n",
        "      <td>-0.016885</td>\n",
        "      <td>-0.335290</td>\n",
        "      <td>-0.442823</td>\n",
        "      <td> 0.965663</td>\n",
        "      <td> 0.970617</td>\n",
        "      <td>-0.110646</td>\n",
        "      <td>-0.114341</td>\n",
        "      <td>-0.819335</td>\n",
        "      <td> 0.795844</td>\n",
        "      <td>-0.092661</td>\n",
        "      <td> 0.113845</td>\n",
        "      <td>-0.708392</td>\n",
        "      <td>-0.725272</td>\n",
        "      <td> 0.701149</td>\n",
        "      <td>-0.652636</td>\n",
        "      <td>-0.103715</td>\n",
        "      <td> 0.968015</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td>-0.111578</td>\n",
        "      <td>-0.110532</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_rerror_rate</th>\n",
        "      <td>-0.007759</td>\n",
        "      <td>-0.297338</td>\n",
        "      <td>-0.003042</td>\n",
        "      <td> 0.000389</td>\n",
        "      <td> 0.046656</td>\n",
        "      <td>-0.000786</td>\n",
        "      <td> 0.199019</td>\n",
        "      <td> 0.032395</td>\n",
        "      <td> 0.007236</td>\n",
        "      <td> 0.214115</td>\n",
        "      <td> 0.002763</td>\n",
        "      <td> 0.003173</td>\n",
        "      <td>-0.000421</td>\n",
        "      <td> 0.000626</td>\n",
        "      <td>-0.000617</td>\n",
        "      <td>-0.004743</td>\n",
        "      <td>-0.000823</td>\n",
        "      <td>-0.000435</td>\n",
        "      <td> 0.025282</td>\n",
        "      <td>-0.261194</td>\n",
        "      <td>-0.313442</td>\n",
        "      <td>-0.103198</td>\n",
        "      <td>-0.122630</td>\n",
        "      <td> 0.910225</td>\n",
        "      <td> 0.904591</td>\n",
        "      <td>-0.282487</td>\n",
        "      <td> 0.299041</td>\n",
        "      <td> 0.022585</td>\n",
        "      <td>-0.125142</td>\n",
        "      <td>-0.312040</td>\n",
        "      <td>-0.278068</td>\n",
        "      <td> 0.287476</td>\n",
        "      <td>-0.299273</td>\n",
        "      <td> 0.114971</td>\n",
        "      <td>-0.087531</td>\n",
        "      <td>-0.111578</td>\n",
        "      <td> 1.000000</td>\n",
        "      <td> 0.950964</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_srv_rerror_rate</th>\n",
        "      <td>-0.013891</td>\n",
        "      <td>-0.300581</td>\n",
        "      <td> 0.001621</td>\n",
        "      <td>-0.001816</td>\n",
        "      <td>-0.013666</td>\n",
        "      <td>-0.000782</td>\n",
        "      <td> 0.189142</td>\n",
        "      <td> 0.032151</td>\n",
        "      <td> 0.012979</td>\n",
        "      <td> 0.217858</td>\n",
        "      <td> 0.002151</td>\n",
        "      <td> 0.001731</td>\n",
        "      <td>-0.005012</td>\n",
        "      <td>-0.001096</td>\n",
        "      <td>-0.002020</td>\n",
        "      <td>-0.004552</td>\n",
        "      <td>-0.001038</td>\n",
        "      <td>-0.000529</td>\n",
        "      <td>-0.004292</td>\n",
        "      <td>-0.256176</td>\n",
        "      <td>-0.308132</td>\n",
        "      <td>-0.105434</td>\n",
        "      <td>-0.124656</td>\n",
        "      <td> 0.911622</td>\n",
        "      <td> 0.914904</td>\n",
        "      <td>-0.282913</td>\n",
        "      <td> 0.298904</td>\n",
        "      <td> 0.024722</td>\n",
        "      <td>-0.125273</td>\n",
        "      <td>-0.300787</td>\n",
        "      <td>-0.264383</td>\n",
        "      <td> 0.271067</td>\n",
        "      <td>-0.297100</td>\n",
        "      <td> 0.120767</td>\n",
        "      <td>-0.096899</td>\n",
        "      <td>-0.110532</td>\n",
        "      <td> 0.950964</td>\n",
        "      <td> 1.000000</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 21,
       "text": [
        "                             duration  src_bytes  dst_bytes      land  \\\n",
        "duration                     1.000000   0.014196   0.299189 -0.001068   \n",
        "src_bytes                    0.014196   1.000000  -0.167931 -0.009404   \n",
        "dst_bytes                    0.299189  -0.167931   1.000000 -0.003040   \n",
        "land                        -0.001068  -0.009404  -0.003040  1.000000   \n",
        "wrong_fragment              -0.008025  -0.019358  -0.022659 -0.000333   \n",
        "urgent                       0.017883   0.000094   0.007234 -0.000065   \n",
        "hot                          0.108639   0.113920   0.193156 -0.000539   \n",
        "num_failed_logins            0.014363  -0.008396   0.021952 -0.000076   \n",
        "logged_in                    0.159564  -0.089702   0.882185 -0.002785   \n",
        "num_compromised              0.010687   0.118562   0.169772 -0.000447   \n",
        "root_shell                   0.040425   0.003067   0.026054 -0.000093   \n",
        "su_attempted                 0.026015   0.002282   0.012192 -0.000049   \n",
        "num_root                     0.013401  -0.002050  -0.003884 -0.000230   \n",
        "num_file_creations           0.061099   0.027710   0.034154 -0.000150   \n",
        "num_shells                   0.008632   0.014403  -0.000054 -0.000076   \n",
        "num_access_files             0.019407  -0.001497   0.065776 -0.000211   \n",
        "num_outbound_cmds           -0.000019   0.000010  -0.000031 -0.002881   \n",
        "is_hot_login                -0.000010   0.000019   0.000041  0.002089   \n",
        "is_guest_login               0.205606   0.027511   0.085947 -0.000250   \n",
        "count                       -0.259032   0.666230  -0.639157 -0.010939   \n",
        "srv_count                   -0.250139   0.722609  -0.497683 -0.010128   \n",
        "serror_rate                 -0.074211  -0.657460  -0.205848  0.014160   \n",
        "srv_serror_rate             -0.073663  -0.652391  -0.198715  0.014342   \n",
        "rerror_rate                 -0.025936  -0.342180  -0.100958 -0.000451   \n",
        "srv_rerror_rate             -0.026420  -0.332977  -0.081307 -0.001690   \n",
        "same_srv_rate                0.062291   0.744046   0.229677  0.002153   \n",
        "diff_srv_rate               -0.050875  -0.739988  -0.222572 -0.001846   \n",
        "srv_diff_host_rate           0.123621  -0.104042   0.521003  0.020678   \n",
        "dst_host_count              -0.161107   0.130377  -0.611972 -0.019923   \n",
        "dst_host_srv_count          -0.217167   0.741979   0.024124 -0.012341   \n",
        "dst_host_same_srv_rate      -0.211979   0.729151   0.055033  0.002576   \n",
        "dst_host_diff_srv_rate       0.231644  -0.712965  -0.035073 -0.001803   \n",
        "dst_host_same_src_port_rate -0.065202   0.815039  -0.396195  0.004265   \n",
        "dst_host_srv_diff_host_rate  0.100692  -0.140231   0.578557  0.016171   \n",
        "dst_host_serror_rate        -0.056753  -0.645920  -0.167047  0.013566   \n",
        "dst_host_srv_serror_rate    -0.057298  -0.641792  -0.158378  0.012265   \n",
        "dst_host_rerror_rate        -0.007759  -0.297338  -0.003042  0.000389   \n",
        "dst_host_srv_rerror_rate    -0.013891  -0.300581   0.001621 -0.001816   \n",
        "\n",
        "                             wrong_fragment    urgent       hot  \\\n",
        "duration                          -0.008025  0.017883  0.108639   \n",
        "src_bytes                         -0.019358  0.000094  0.113920   \n",
        "dst_bytes                         -0.022659  0.007234  0.193156   \n",
        "land                              -0.000333 -0.000065 -0.000539   \n",
        "wrong_fragment                     1.000000 -0.000150 -0.004042   \n",
        "urgent                            -0.000150  1.000000  0.008594   \n",
        "hot                               -0.004042  0.008594  1.000000   \n",
        "num_failed_logins                 -0.000568  0.063009  0.112560   \n",
        "logged_in                         -0.020911  0.006821  0.189126   \n",
        "num_compromised                   -0.003370  0.031765  0.811529   \n",
        "root_shell                        -0.000528  0.067437  0.101983   \n",
        "su_attempted                      -0.000248  0.000020 -0.000400   \n",
        "num_root                          -0.001727  0.061994  0.003096   \n",
        "num_file_creations                -0.001160  0.061383  0.028694   \n",
        "num_shells                        -0.000507 -0.000066  0.009146   \n",
        "num_access_files                  -0.001519  0.023380  0.004224   \n",
        "num_outbound_cmds                 -0.000147  0.012879 -0.000393   \n",
        "is_hot_login                       0.000441  0.005162 -0.000248   \n",
        "is_guest_login                    -0.001869 -0.000100  0.463706   \n",
        "count                             -0.057711 -0.004778 -0.120847   \n",
        "srv_count                         -0.029117 -0.004799 -0.114735   \n",
        "serror_rate                       -0.008849 -0.001338 -0.035487   \n",
        "srv_serror_rate                   -0.023382 -0.001327 -0.034934   \n",
        "rerror_rate                        0.000430 -0.000705  0.013468   \n",
        "srv_rerror_rate                   -0.012676 -0.000726  0.052003   \n",
        "same_srv_rate                      0.010218  0.001521  0.041342   \n",
        "diff_srv_rate                     -0.009386 -0.001522 -0.040555   \n",
        "srv_diff_host_rate                 0.012117 -0.000788  0.032141   \n",
        "dst_host_count                    -0.029149 -0.005894 -0.074178   \n",
        "dst_host_srv_count                -0.058225 -0.005698 -0.017960   \n",
        "dst_host_same_srv_rate            -0.049560 -0.004078  0.018783   \n",
        "dst_host_diff_srv_rate             0.055542  0.005208 -0.017198   \n",
        "dst_host_same_src_port_rate       -0.015449 -0.001939 -0.086998   \n",
        "dst_host_srv_diff_host_rate        0.007306 -0.000976 -0.014141   \n",
        "dst_host_serror_rate               0.010387 -0.001381 -0.004706   \n",
        "dst_host_srv_serror_rate          -0.024117 -0.001370 -0.010721   \n",
        "dst_host_rerror_rate               0.046656 -0.000786  0.199019   \n",
        "dst_host_srv_rerror_rate          -0.013666 -0.000782  0.189142   \n",
        "\n",
        "                             num_failed_logins  logged_in  num_compromised  \\\n",
        "duration                              0.014363   0.159564         0.010687   \n",
        "src_bytes                            -0.008396  -0.089702         0.118562   \n",
        "dst_bytes                             0.021952   0.882185         0.169772   \n",
        "land                                 -0.000076  -0.002785        -0.000447   \n",
        "wrong_fragment                       -0.000568  -0.020911        -0.003370   \n",
        "urgent                                0.063009   0.006821         0.031765   \n",
        "hot                                   0.112560   0.189126         0.811529   \n",
        "num_failed_logins                     1.000000  -0.002190         0.004619   \n",
        "logged_in                            -0.002190   1.000000         0.161190   \n",
        "num_compromised                       0.004619   0.161190         1.000000   \n",
        "root_shell                            0.016895   0.025293         0.085558   \n",
        "su_attempted                          0.072748   0.011813         0.048985   \n",
        "num_root                              0.010060   0.082533         0.028557   \n",
        "num_file_creations                    0.015211   0.055530         0.031223   \n",
        "num_shells                           -0.000093   0.024354         0.011256   \n",
        "num_access_files                      0.005581   0.072698         0.006977   \n",
        "num_outbound_cmds                     0.003431   0.000079         0.001048   \n",
        "is_hot_login                         -0.001560   0.000127        -0.000438   \n",
        "is_guest_login                       -0.000428   0.089318        -0.002504   \n",
        "count                                -0.018024  -0.578287        -0.097212   \n",
        "srv_count                            -0.018027  -0.438947        -0.091154   \n",
        "serror_rate                          -0.003674  -0.187114        -0.030516   \n",
        "srv_serror_rate                      -0.004027  -0.180122        -0.030264   \n",
        "rerror_rate                           0.035324  -0.091962         0.008573   \n",
        "srv_rerror_rate                       0.034876  -0.072287         0.054006   \n",
        "same_srv_rate                         0.005716   0.216969         0.035253   \n",
        "diff_srv_rate                        -0.005538  -0.214019        -0.034953   \n",
        "srv_diff_host_rate                   -0.003096   0.503807         0.036497   \n",
        "dst_host_count                       -0.028369  -0.682721        -0.041615   \n",
        "dst_host_srv_count                   -0.015092   0.080352         0.003465   \n",
        "dst_host_same_srv_rate                0.003004   0.114526         0.038980   \n",
        "dst_host_diff_srv_rate               -0.002960  -0.093565        -0.039091   \n",
        "dst_host_same_src_port_rate          -0.006617  -0.359506        -0.078843   \n",
        "dst_host_srv_diff_host_rate          -0.002588   0.659078        -0.020979   \n",
        "dst_host_serror_rate                  0.014713  -0.143283        -0.005019   \n",
        "dst_host_srv_serror_rate              0.014914  -0.132474        -0.004504   \n",
        "dst_host_rerror_rate                  0.032395   0.007236         0.214115   \n",
        "dst_host_srv_rerror_rate              0.032151   0.012979         0.217858   \n",
        "\n",
        "                             root_shell  su_attempted  num_root  \\\n",
        "duration                       0.040425      0.026015  0.013401   \n",
        "src_bytes                      0.003067      0.002282 -0.002050   \n",
        "dst_bytes                      0.026054      0.012192 -0.003884   \n",
        "land                          -0.000093     -0.000049 -0.000230   \n",
        "wrong_fragment                -0.000528     -0.000248 -0.001727   \n",
        "urgent                         0.067437      0.000020  0.061994   \n",
        "hot                            0.101983     -0.000400  0.003096   \n",
        "num_failed_logins              0.016895      0.072748  0.010060   \n",
        "logged_in                      0.025293      0.011813  0.082533   \n",
        "num_compromised                0.085558      0.048985  0.028557   \n",
        "root_shell                     1.000000      0.233486  0.094512   \n",
        "su_attempted                   0.233486      1.000000  0.119326   \n",
        "num_root                       0.094512      0.119326  1.000000   \n",
        "num_file_creations             0.140650      0.053110  0.047521   \n",
        "num_shells                     0.132056      0.040487  0.034405   \n",
        "num_access_files               0.069353      0.081272  0.014513   \n",
        "num_outbound_cmds              0.011462     -0.018896  0.001524   \n",
        "is_hot_login                  -0.006602      0.012927 -0.002585   \n",
        "is_guest_login                -0.000405     -0.000219 -0.001281   \n",
        "count                         -0.016409     -0.008279 -0.054721   \n",
        "srv_count                     -0.015174     -0.008225 -0.053530   \n",
        "serror_rate                   -0.004952     -0.002318 -0.016031   \n",
        "srv_serror_rate               -0.004923     -0.002295 -0.015936   \n",
        "rerror_rate                   -0.001104     -0.001227 -0.008610   \n",
        "srv_rerror_rate               -0.001143     -0.001253 -0.008708   \n",
        "same_srv_rate                  0.004946      0.002634  0.013881   \n",
        "diff_srv_rate                 -0.004553     -0.002649 -0.011337   \n",
        "srv_diff_host_rate             0.002286      0.000348  0.006316   \n",
        "dst_host_count                -0.021367     -0.006697 -0.078717   \n",
        "dst_host_srv_count            -0.011906     -0.006288 -0.038689   \n",
        "dst_host_same_srv_rate         0.000515     -0.005738 -0.038935   \n",
        "dst_host_diff_srv_rate        -0.000916      0.006687  0.047414   \n",
        "dst_host_same_src_port_rate   -0.004617     -0.005020 -0.015968   \n",
        "dst_host_srv_diff_host_rate    0.008631      0.001052  0.061030   \n",
        "dst_host_serror_rate          -0.003498      0.001974 -0.008457   \n",
        "dst_host_srv_serror_rate      -0.003032      0.002893 -0.007096   \n",
        "dst_host_rerror_rate           0.002763      0.003173 -0.000421   \n",
        "dst_host_srv_rerror_rate       0.002151      0.001731 -0.005012   \n",
        "\n",
        "                             num_file_creations  num_shells  num_access_files  \\\n",
        "duration                               0.061099    0.008632          0.019407   \n",
        "src_bytes                              0.027710    0.014403         -0.001497   \n",
        "dst_bytes                              0.034154   -0.000054          0.065776   \n",
        "land                                  -0.000150   -0.000076         -0.000211   \n",
        "wrong_fragment                        -0.001160   -0.000507         -0.001519   \n",
        "urgent                                 0.061383   -0.000066          0.023380   \n",
        "hot                                    0.028694    0.009146          0.004224   \n",
        "num_failed_logins                      0.015211   -0.000093          0.005581   \n",
        "logged_in                              0.055530    0.024354          0.072698   \n",
        "num_compromised                        0.031223    0.011256          0.006977   \n",
        "root_shell                             0.140650    0.132056          0.069353   \n",
        "su_attempted                           0.053110    0.040487          0.081272   \n",
        "num_root                               0.047521    0.034405          0.014513   \n",
        "num_file_creations                     1.000000    0.068660          0.031042   \n",
        "num_shells                             0.068660    1.000000          0.019438   \n",
        "num_access_files                       0.031042    0.019438          1.000000   \n",
        "num_outbound_cmds                     -0.004081   -0.002592         -0.001597   \n",
        "is_hot_login                          -0.001664   -0.006631         -0.002850   \n",
        "is_guest_login                         0.013242   -0.000405          0.002466   \n",
        "count                                 -0.036467   -0.013938         -0.045282   \n",
        "srv_count                             -0.034598   -0.011784         -0.040497   \n",
        "serror_rate                           -0.009703   -0.004343         -0.013945   \n",
        "srv_serror_rate                       -0.010390   -0.004740         -0.013572   \n",
        "rerror_rate                           -0.005069   -0.002541         -0.007581   \n",
        "srv_rerror_rate                       -0.004775   -0.002572          0.001874   \n",
        "same_srv_rate                          0.009784    0.004282          0.015499   \n",
        "diff_srv_rate                         -0.008711   -0.003743         -0.015112   \n",
        "srv_diff_host_rate                     0.014412    0.001096          0.024266   \n",
        "dst_host_count                        -0.049529   -0.021200         -0.023865   \n",
        "dst_host_srv_count                    -0.026890   -0.012017         -0.023657   \n",
        "dst_host_same_srv_rate                -0.021731   -0.009962         -0.021358   \n",
        "dst_host_diff_srv_rate                 0.027092    0.010761          0.026703   \n",
        "dst_host_same_src_port_rate           -0.015018   -0.003521         -0.033288   \n",
        "dst_host_srv_diff_host_rate            0.030590    0.015882          0.011765   \n",
        "dst_host_serror_rate                  -0.002257   -0.001588         -0.011197   \n",
        "dst_host_srv_serror_rate              -0.004295   -0.002357         -0.011487   \n",
        "dst_host_rerror_rate                   0.000626   -0.000617         -0.004743   \n",
        "dst_host_srv_rerror_rate              -0.001096   -0.002020         -0.004552   \n",
        "\n",
        "                             num_outbound_cmds  is_hot_login  is_guest_login  \\\n",
        "duration                             -0.000019     -0.000010        0.205606   \n",
        "src_bytes                             0.000010      0.000019        0.027511   \n",
        "dst_bytes                            -0.000031      0.000041        0.085947   \n",
        "land                                 -0.002881      0.002089       -0.000250   \n",
        "wrong_fragment                       -0.000147      0.000441       -0.001869   \n",
        "urgent                                0.012879      0.005162       -0.000100   \n",
        "hot                                  -0.000393     -0.000248        0.463706   \n",
        "num_failed_logins                     0.003431     -0.001560       -0.000428   \n",
        "logged_in                             0.000079      0.000127        0.089318   \n",
        "num_compromised                       0.001048     -0.000438       -0.002504   \n",
        "root_shell                            0.011462     -0.006602       -0.000405   \n",
        "su_attempted                         -0.018896      0.012927       -0.000219   \n",
        "num_root                              0.001524     -0.002585       -0.001281   \n",
        "num_file_creations                   -0.004081     -0.001664        0.013242   \n",
        "num_shells                           -0.002592     -0.006631       -0.000405   \n",
        "num_access_files                     -0.001597     -0.002850        0.002466   \n",
        "num_outbound_cmds                     1.000000      0.822890        0.000924   \n",
        "is_hot_login                          0.822890      1.000000        0.001512   \n",
        "is_guest_login                        0.000924      0.001512        1.000000   \n",
        "count                                -0.000076      0.000036       -0.062340   \n",
        "srv_count                             0.000100      0.000064       -0.062713   \n",
        "serror_rate                           0.000167      0.000102       -0.017343   \n",
        "srv_serror_rate                       0.000209     -0.000302       -0.017240   \n",
        "rerror_rate                           0.000536     -0.000550       -0.008867   \n",
        "srv_rerror_rate                       0.000346      0.000457       -0.009193   \n",
        "same_srv_rate                         0.000208     -0.000159        0.018042   \n",
        "diff_srv_rate                         0.000328     -0.000235       -0.017000   \n",
        "srv_diff_host_rate                   -0.000141     -0.000360       -0.008878   \n",
        "dst_host_count                       -0.000424     -0.000106       -0.055453   \n",
        "dst_host_srv_count                   -0.000280      0.000206       -0.044366   \n",
        "dst_host_same_srv_rate               -0.000503      0.000229       -0.041749   \n",
        "dst_host_diff_srv_rate               -0.000181     -0.000004        0.044640   \n",
        "dst_host_same_src_port_rate          -0.000455      0.000283       -0.038092   \n",
        "dst_host_srv_diff_host_rate           0.000288      0.000538       -0.012578   \n",
        "dst_host_serror_rate                 -0.000011     -0.000076       -0.001066   \n",
        "dst_host_srv_serror_rate             -0.000372     -0.000007       -0.016885   \n",
        "dst_host_rerror_rate                 -0.000823     -0.000435        0.025282   \n",
        "dst_host_srv_rerror_rate             -0.001038     -0.000529       -0.004292   \n",
        "\n",
        "                                count  srv_count  serror_rate  \\\n",
        "duration                    -0.259032  -0.250139    -0.074211   \n",
        "src_bytes                    0.666230   0.722609    -0.657460   \n",
        "dst_bytes                   -0.639157  -0.497683    -0.205848   \n",
        "land                        -0.010939  -0.010128     0.014160   \n",
        "wrong_fragment              -0.057711  -0.029117    -0.008849   \n",
        "urgent                      -0.004778  -0.004799    -0.001338   \n",
        "hot                         -0.120847  -0.114735    -0.035487   \n",
        "num_failed_logins           -0.018024  -0.018027    -0.003674   \n",
        "logged_in                   -0.578287  -0.438947    -0.187114   \n",
        "num_compromised             -0.097212  -0.091154    -0.030516   \n",
        "root_shell                  -0.016409  -0.015174    -0.004952   \n",
        "su_attempted                -0.008279  -0.008225    -0.002318   \n",
        "num_root                    -0.054721  -0.053530    -0.016031   \n",
        "num_file_creations          -0.036467  -0.034598    -0.009703   \n",
        "num_shells                  -0.013938  -0.011784    -0.004343   \n",
        "num_access_files            -0.045282  -0.040497    -0.013945   \n",
        "num_outbound_cmds           -0.000076   0.000100     0.000167   \n",
        "is_hot_login                 0.000036   0.000064     0.000102   \n",
        "is_guest_login              -0.062340  -0.062713    -0.017343   \n",
        "count                        1.000000   0.950587    -0.303538   \n",
        "srv_count                    0.950587   1.000000    -0.428185   \n",
        "serror_rate                 -0.303538  -0.428185     1.000000   \n",
        "srv_serror_rate             -0.308923  -0.421424     0.990888   \n",
        "rerror_rate                 -0.213824  -0.281468    -0.091157   \n",
        "srv_rerror_rate             -0.221352  -0.284034    -0.095285   \n",
        "same_srv_rate                0.346718   0.517227    -0.851915   \n",
        "diff_srv_rate               -0.361737  -0.511998     0.828012   \n",
        "srv_diff_host_rate          -0.384010  -0.239057    -0.121489   \n",
        "dst_host_count               0.547443   0.442611     0.165350   \n",
        "dst_host_srv_count           0.586979   0.720746    -0.724317   \n",
        "dst_host_same_srv_rate       0.539698   0.681955    -0.745745   \n",
        "dst_host_diff_srv_rate      -0.546869  -0.673916     0.719708   \n",
        "dst_host_same_src_port_rate  0.776906   0.812280    -0.650336   \n",
        "dst_host_srv_diff_host_rate -0.496554  -0.391712    -0.153568   \n",
        "dst_host_serror_rate        -0.331571  -0.449096     0.973947   \n",
        "dst_host_srv_serror_rate    -0.335290  -0.442823     0.965663   \n",
        "dst_host_rerror_rate        -0.261194  -0.313442    -0.103198   \n",
        "dst_host_srv_rerror_rate    -0.256176  -0.308132    -0.105434   \n",
        "\n",
        "                             srv_serror_rate  rerror_rate  srv_rerror_rate  \\\n",
        "duration                           -0.073663    -0.025936        -0.026420   \n",
        "src_bytes                          -0.652391    -0.342180        -0.332977   \n",
        "dst_bytes                          -0.198715    -0.100958        -0.081307   \n",
        "land                                0.014342    -0.000451        -0.001690   \n",
        "wrong_fragment                     -0.023382     0.000430        -0.012676   \n",
        "urgent                             -0.001327    -0.000705        -0.000726   \n",
        "hot                                -0.034934     0.013468         0.052003   \n",
        "num_failed_logins                  -0.004027     0.035324         0.034876   \n",
        "logged_in                          -0.180122    -0.091962        -0.072287   \n",
        "num_compromised                    -0.030264     0.008573         0.054006   \n",
        "root_shell                         -0.004923    -0.001104        -0.001143   \n",
        "su_attempted                       -0.002295    -0.001227        -0.001253   \n",
        "num_root                           -0.015936    -0.008610        -0.008708   \n",
        "num_file_creations                 -0.010390    -0.005069        -0.004775   \n",
        "num_shells                         -0.004740    -0.002541        -0.002572   \n",
        "num_access_files                   -0.013572    -0.007581         0.001874   \n",
        "num_outbound_cmds                   0.000209     0.000536         0.000346   \n",
        "is_hot_login                       -0.000302    -0.000550         0.000457   \n",
        "is_guest_login                     -0.017240    -0.008867        -0.009193   \n",
        "count                              -0.308923    -0.213824        -0.221352   \n",
        "srv_count                          -0.421424    -0.281468        -0.284034   \n",
        "serror_rate                         0.990888    -0.091157        -0.095285   \n",
        "srv_serror_rate                     1.000000    -0.110664        -0.115286   \n",
        "rerror_rate                        -0.110664     1.000000         0.978813   \n",
        "srv_rerror_rate                    -0.115286     0.978813         1.000000   \n",
        "same_srv_rate                      -0.839315    -0.327986        -0.316568   \n",
        "diff_srv_rate                       0.815305     0.345571         0.333439   \n",
        "srv_diff_host_rate                 -0.112222    -0.017902         0.011285   \n",
        "dst_host_count                      0.160322    -0.067857        -0.072595   \n",
        "dst_host_srv_count                 -0.713313    -0.330391        -0.323032   \n",
        "dst_host_same_srv_rate             -0.734334    -0.303126        -0.294328   \n",
        "dst_host_diff_srv_rate              0.707753     0.308722         0.300186   \n",
        "dst_host_same_src_port_rate        -0.646256    -0.278465        -0.282239   \n",
        "dst_host_srv_diff_host_rate        -0.148072     0.073061         0.075178   \n",
        "dst_host_serror_rate                0.967214    -0.094076        -0.096146   \n",
        "dst_host_srv_serror_rate            0.970617    -0.110646        -0.114341   \n",
        "dst_host_rerror_rate               -0.122630     0.910225         0.904591   \n",
        "dst_host_srv_rerror_rate           -0.124656     0.911622         0.914904   \n",
        "\n",
        "                             same_srv_rate  diff_srv_rate  srv_diff_host_rate  \\\n",
        "duration                          0.062291      -0.050875            0.123621   \n",
        "src_bytes                         0.744046      -0.739988           -0.104042   \n",
        "dst_bytes                         0.229677      -0.222572            0.521003   \n",
        "land                              0.002153      -0.001846            0.020678   \n",
        "wrong_fragment                    0.010218      -0.009386            0.012117   \n",
        "urgent                            0.001521      -0.001522           -0.000788   \n",
        "hot                               0.041342      -0.040555            0.032141   \n",
        "num_failed_logins                 0.005716      -0.005538           -0.003096   \n",
        "logged_in                         0.216969      -0.214019            0.503807   \n",
        "num_compromised                   0.035253      -0.034953            0.036497   \n",
        "root_shell                        0.004946      -0.004553            0.002286   \n",
        "su_attempted                      0.002634      -0.002649            0.000348   \n",
        "num_root                          0.013881      -0.011337            0.006316   \n",
        "num_file_creations                0.009784      -0.008711            0.014412   \n",
        "num_shells                        0.004282      -0.003743            0.001096   \n",
        "num_access_files                  0.015499      -0.015112            0.024266   \n",
        "num_outbound_cmds                 0.000208       0.000328           -0.000141   \n",
        "is_hot_login                     -0.000159      -0.000235           -0.000360   \n",
        "is_guest_login                    0.018042      -0.017000           -0.008878   \n",
        "count                             0.346718      -0.361737           -0.384010   \n",
        "srv_count                         0.517227      -0.511998           -0.239057   \n",
        "serror_rate                      -0.851915       0.828012           -0.121489   \n",
        "srv_serror_rate                  -0.839315       0.815305           -0.112222   \n",
        "rerror_rate                      -0.327986       0.345571           -0.017902   \n",
        "srv_rerror_rate                  -0.316568       0.333439            0.011285   \n",
        "same_srv_rate                     1.000000      -0.982109            0.140660   \n",
        "diff_srv_rate                    -0.982109       1.000000           -0.138293   \n",
        "srv_diff_host_rate                0.140660      -0.138293            1.000000   \n",
        "dst_host_count                   -0.190121       0.185942           -0.445051   \n",
        "dst_host_srv_count                0.848754      -0.844028            0.035010   \n",
        "dst_host_same_srv_rate            0.873551      -0.868580            0.068648   \n",
        "dst_host_diff_srv_rate           -0.844537       0.850911           -0.050472   \n",
        "dst_host_same_src_port_rate       0.732841      -0.727031           -0.222707   \n",
        "dst_host_srv_diff_host_rate       0.179040      -0.176930            0.433173   \n",
        "dst_host_serror_rate             -0.830067       0.807205           -0.097973   \n",
        "dst_host_srv_serror_rate         -0.819335       0.795844           -0.092661   \n",
        "dst_host_rerror_rate             -0.282487       0.299041            0.022585   \n",
        "dst_host_srv_rerror_rate         -0.282913       0.298904            0.024722   \n",
        "\n",
        "                             dst_host_count  dst_host_srv_count  \\\n",
        "duration                          -0.161107           -0.217167   \n",
        "src_bytes                          0.130377            0.741979   \n",
        "dst_bytes                         -0.611972            0.024124   \n",
        "land                              -0.019923           -0.012341   \n",
        "wrong_fragment                    -0.029149           -0.058225   \n",
        "urgent                            -0.005894           -0.005698   \n",
        "hot                               -0.074178           -0.017960   \n",
        "num_failed_logins                 -0.028369           -0.015092   \n",
        "logged_in                         -0.682721            0.080352   \n",
        "num_compromised                   -0.041615            0.003465   \n",
        "root_shell                        -0.021367           -0.011906   \n",
        "su_attempted                      -0.006697           -0.006288   \n",
        "num_root                          -0.078717           -0.038689   \n",
        "num_file_creations                -0.049529           -0.026890   \n",
        "num_shells                        -0.021200           -0.012017   \n",
        "num_access_files                  -0.023865           -0.023657   \n",
        "num_outbound_cmds                 -0.000424           -0.000280   \n",
        "is_hot_login                      -0.000106            0.000206   \n",
        "is_guest_login                    -0.055453           -0.044366   \n",
        "count                              0.547443            0.586979   \n",
        "srv_count                          0.442611            0.720746   \n",
        "serror_rate                        0.165350           -0.724317   \n",
        "srv_serror_rate                    0.160322           -0.713313   \n",
        "rerror_rate                       -0.067857           -0.330391   \n",
        "srv_rerror_rate                   -0.072595           -0.323032   \n",
        "same_srv_rate                     -0.190121            0.848754   \n",
        "diff_srv_rate                      0.185942           -0.844028   \n",
        "srv_diff_host_rate                -0.445051            0.035010   \n",
        "dst_host_count                     1.000000            0.022731   \n",
        "dst_host_srv_count                 0.022731            1.000000   \n",
        "dst_host_same_srv_rate            -0.070448            0.970072   \n",
        "dst_host_diff_srv_rate             0.044338           -0.955178   \n",
        "dst_host_same_src_port_rate        0.189876            0.769481   \n",
        "dst_host_srv_diff_host_rate       -0.918894            0.043668   \n",
        "dst_host_serror_rate               0.123881           -0.722607   \n",
        "dst_host_srv_serror_rate           0.113845           -0.708392   \n",
        "dst_host_rerror_rate              -0.125142           -0.312040   \n",
        "dst_host_srv_rerror_rate          -0.125273           -0.300787   \n",
        "\n",
        "                             dst_host_same_srv_rate  dst_host_diff_srv_rate  \\\n",
        "duration                                  -0.211979                0.231644   \n",
        "src_bytes                                  0.729151               -0.712965   \n",
        "dst_bytes                                  0.055033               -0.035073   \n",
        "land                                       0.002576               -0.001803   \n",
        "wrong_fragment                            -0.049560                0.055542   \n",
        "urgent                                    -0.004078                0.005208   \n",
        "hot                                        0.018783               -0.017198   \n",
        "num_failed_logins                          0.003004               -0.002960   \n",
        "logged_in                                  0.114526               -0.093565   \n",
        "num_compromised                            0.038980               -0.039091   \n",
        "root_shell                                 0.000515               -0.000916   \n",
        "su_attempted                              -0.005738                0.006687   \n",
        "num_root                                  -0.038935                0.047414   \n",
        "num_file_creations                        -0.021731                0.027092   \n",
        "num_shells                                -0.009962                0.010761   \n",
        "num_access_files                          -0.021358                0.026703   \n",
        "num_outbound_cmds                         -0.000503               -0.000181   \n",
        "is_hot_login                               0.000229               -0.000004   \n",
        "is_guest_login                            -0.041749                0.044640   \n",
        "count                                      0.539698               -0.546869   \n",
        "srv_count                                  0.681955               -0.673916   \n",
        "serror_rate                               -0.745745                0.719708   \n",
        "srv_serror_rate                           -0.734334                0.707753   \n",
        "rerror_rate                               -0.303126                0.308722   \n",
        "srv_rerror_rate                           -0.294328                0.300186   \n",
        "same_srv_rate                              0.873551               -0.844537   \n",
        "diff_srv_rate                             -0.868580                0.850911   \n",
        "srv_diff_host_rate                         0.068648               -0.050472   \n",
        "dst_host_count                            -0.070448                0.044338   \n",
        "dst_host_srv_count                         0.970072               -0.955178   \n",
        "dst_host_same_srv_rate                     1.000000               -0.980245   \n",
        "dst_host_diff_srv_rate                    -0.980245                1.000000   \n",
        "dst_host_same_src_port_rate                0.771158               -0.766402   \n",
        "dst_host_srv_diff_host_rate                0.107926               -0.088665   \n",
        "dst_host_serror_rate                      -0.742045                0.719275   \n",
        "dst_host_srv_serror_rate                  -0.725272                0.701149   \n",
        "dst_host_rerror_rate                      -0.278068                0.287476   \n",
        "dst_host_srv_rerror_rate                  -0.264383                0.271067   \n",
        "\n",
        "                             dst_host_same_src_port_rate  \\\n",
        "duration                                       -0.065202   \n",
        "src_bytes                                       0.815039   \n",
        "dst_bytes                                      -0.396195   \n",
        "land                                            0.004265   \n",
        "wrong_fragment                                 -0.015449   \n",
        "urgent                                         -0.001939   \n",
        "hot                                            -0.086998   \n",
        "num_failed_logins                              -0.006617   \n",
        "logged_in                                      -0.359506   \n",
        "num_compromised                                -0.078843   \n",
        "root_shell                                     -0.004617   \n",
        "su_attempted                                   -0.005020   \n",
        "num_root                                       -0.015968   \n",
        "num_file_creations                             -0.015018   \n",
        "num_shells                                     -0.003521   \n",
        "num_access_files                               -0.033288   \n",
        "num_outbound_cmds                              -0.000455   \n",
        "is_hot_login                                    0.000283   \n",
        "is_guest_login                                 -0.038092   \n",
        "count                                           0.776906   \n",
        "srv_count                                       0.812280   \n",
        "serror_rate                                    -0.650336   \n",
        "srv_serror_rate                                -0.646256   \n",
        "rerror_rate                                    -0.278465   \n",
        "srv_rerror_rate                                -0.282239   \n",
        "same_srv_rate                                   0.732841   \n",
        "diff_srv_rate                                  -0.727031   \n",
        "srv_diff_host_rate                             -0.222707   \n",
        "dst_host_count                                  0.189876   \n",
        "dst_host_srv_count                              0.769481   \n",
        "dst_host_same_srv_rate                          0.771158   \n",
        "dst_host_diff_srv_rate                         -0.766402   \n",
        "dst_host_same_src_port_rate                     1.000000   \n",
        "dst_host_srv_diff_host_rate                    -0.175310   \n",
        "dst_host_serror_rate                           -0.658737   \n",
        "dst_host_srv_serror_rate                       -0.652636   \n",
        "dst_host_rerror_rate                           -0.299273   \n",
        "dst_host_srv_rerror_rate                       -0.297100   \n",
        "\n",
        "                             dst_host_srv_diff_host_rate  \\\n",
        "duration                                        0.100692   \n",
        "src_bytes                                      -0.140231   \n",
        "dst_bytes                                       0.578557   \n",
        "land                                            0.016171   \n",
        "wrong_fragment                                  0.007306   \n",
        "urgent                                         -0.000976   \n",
        "hot                                            -0.014141   \n",
        "num_failed_logins                              -0.002588   \n",
        "logged_in                                       0.659078   \n",
        "num_compromised                                -0.020979   \n",
        "root_shell                                      0.008631   \n",
        "su_attempted                                    0.001052   \n",
        "num_root                                        0.061030   \n",
        "num_file_creations                              0.030590   \n",
        "num_shells                                      0.015882   \n",
        "num_access_files                                0.011765   \n",
        "num_outbound_cmds                               0.000288   \n",
        "is_hot_login                                    0.000538   \n",
        "is_guest_login                                 -0.012578   \n",
        "count                                          -0.496554   \n",
        "srv_count                                      -0.391712   \n",
        "serror_rate                                    -0.153568   \n",
        "srv_serror_rate                                -0.148072   \n",
        "rerror_rate                                     0.073061   \n",
        "srv_rerror_rate                                 0.075178   \n",
        "same_srv_rate                                   0.179040   \n",
        "diff_srv_rate                                  -0.176930   \n",
        "srv_diff_host_rate                              0.433173   \n",
        "dst_host_count                                 -0.918894   \n",
        "dst_host_srv_count                              0.043668   \n",
        "dst_host_same_srv_rate                          0.107926   \n",
        "dst_host_diff_srv_rate                         -0.088665   \n",
        "dst_host_same_src_port_rate                    -0.175310   \n",
        "dst_host_srv_diff_host_rate                     1.000000   \n",
        "dst_host_serror_rate                           -0.118697   \n",
        "dst_host_srv_serror_rate                       -0.103715   \n",
        "dst_host_rerror_rate                            0.114971   \n",
        "dst_host_srv_rerror_rate                        0.120767   \n",
        "\n",
        "                             dst_host_serror_rate  dst_host_srv_serror_rate  \\\n",
        "duration                                -0.056753                 -0.057298   \n",
        "src_bytes                               -0.645920                 -0.641792   \n",
        "dst_bytes                               -0.167047                 -0.158378   \n",
        "land                                     0.013566                  0.012265   \n",
        "wrong_fragment                           0.010387                 -0.024117   \n",
        "urgent                                  -0.001381                 -0.001370   \n",
        "hot                                     -0.004706                 -0.010721   \n",
        "num_failed_logins                        0.014713                  0.014914   \n",
        "logged_in                               -0.143283                 -0.132474   \n",
        "num_compromised                         -0.005019                 -0.004504   \n",
        "root_shell                              -0.003498                 -0.003032   \n",
        "su_attempted                             0.001974                  0.002893   \n",
        "num_root                                -0.008457                 -0.007096   \n",
        "num_file_creations                      -0.002257                 -0.004295   \n",
        "num_shells                              -0.001588                 -0.002357   \n",
        "num_access_files                        -0.011197                 -0.011487   \n",
        "num_outbound_cmds                       -0.000011                 -0.000372   \n",
        "is_hot_login                            -0.000076                 -0.000007   \n",
        "is_guest_login                          -0.001066                 -0.016885   \n",
        "count                                   -0.331571                 -0.335290   \n",
        "srv_count                               -0.449096                 -0.442823   \n",
        "serror_rate                              0.973947                  0.965663   \n",
        "srv_serror_rate                          0.967214                  0.970617   \n",
        "rerror_rate                             -0.094076                 -0.110646   \n",
        "srv_rerror_rate                         -0.096146                 -0.114341   \n",
        "same_srv_rate                           -0.830067                 -0.819335   \n",
        "diff_srv_rate                            0.807205                  0.795844   \n",
        "srv_diff_host_rate                      -0.097973                 -0.092661   \n",
        "dst_host_count                           0.123881                  0.113845   \n",
        "dst_host_srv_count                      -0.722607                 -0.708392   \n",
        "dst_host_same_srv_rate                  -0.742045                 -0.725272   \n",
        "dst_host_diff_srv_rate                   0.719275                  0.701149   \n",
        "dst_host_same_src_port_rate             -0.658737                 -0.652636   \n",
        "dst_host_srv_diff_host_rate             -0.118697                 -0.103715   \n",
        "dst_host_serror_rate                     1.000000                  0.968015   \n",
        "dst_host_srv_serror_rate                 0.968015                  1.000000   \n",
        "dst_host_rerror_rate                    -0.087531                 -0.111578   \n",
        "dst_host_srv_rerror_rate                -0.096899                 -0.110532   \n",
        "\n",
        "                             dst_host_rerror_rate  dst_host_srv_rerror_rate  \n",
        "duration                                -0.007759                 -0.013891  \n",
        "src_bytes                               -0.297338                 -0.300581  \n",
        "dst_bytes                               -0.003042                  0.001621  \n",
        "land                                     0.000389                 -0.001816  \n",
        "wrong_fragment                           0.046656                 -0.013666  \n",
        "urgent                                  -0.000786                 -0.000782  \n",
        "hot                                      0.199019                  0.189142  \n",
        "num_failed_logins                        0.032395                  0.032151  \n",
        "logged_in                                0.007236                  0.012979  \n",
        "num_compromised                          0.214115                  0.217858  \n",
        "root_shell                               0.002763                  0.002151  \n",
        "su_attempted                             0.003173                  0.001731  \n",
        "num_root                                -0.000421                 -0.005012  \n",
        "num_file_creations                       0.000626                 -0.001096  \n",
        "num_shells                              -0.000617                 -0.002020  \n",
        "num_access_files                        -0.004743                 -0.004552  \n",
        "num_outbound_cmds                       -0.000823                 -0.001038  \n",
        "is_hot_login                            -0.000435                 -0.000529  \n",
        "is_guest_login                           0.025282                 -0.004292  \n",
        "count                                   -0.261194                 -0.256176  \n",
        "srv_count                               -0.313442                 -0.308132  \n",
        "serror_rate                             -0.103198                 -0.105434  \n",
        "srv_serror_rate                         -0.122630                 -0.124656  \n",
        "rerror_rate                              0.910225                  0.911622  \n",
        "srv_rerror_rate                          0.904591                  0.914904  \n",
        "same_srv_rate                           -0.282487                 -0.282913  \n",
        "diff_srv_rate                            0.299041                  0.298904  \n",
        "srv_diff_host_rate                       0.022585                  0.024722  \n",
        "dst_host_count                          -0.125142                 -0.125273  \n",
        "dst_host_srv_count                      -0.312040                 -0.300787  \n",
        "dst_host_same_srv_rate                  -0.278068                 -0.264383  \n",
        "dst_host_diff_srv_rate                   0.287476                  0.271067  \n",
        "dst_host_same_src_port_rate             -0.299273                 -0.297100  \n",
        "dst_host_srv_diff_host_rate              0.114971                  0.120767  \n",
        "dst_host_serror_rate                    -0.087531                 -0.096899  \n",
        "dst_host_srv_serror_rate                -0.111578                 -0.110532  \n",
        "dst_host_rerror_rate                     1.000000                  0.950964  \n",
        "dst_host_srv_rerror_rate                 0.950964                  1.000000  "
       ]
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We have used a *Pandas* `DataFrame` here to render the correlation matrix in a more comprehensive way. Now we want those variables that are highly correlated. For that we do a bit of dataframe manipulation.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# get a boolean dataframe where true means that a pair of variables is highly correlated\n",
      "highly_correlated_df = (abs(corr_df) > .8) & (corr_df < 1.0)\n",
      "# get the names of the variables so we can use them to slice the dataframe\n",
      "correlated_vars_index = (highly_correlated_df==True).any()\n",
      "correlated_var_names = correlated_vars_index[correlated_vars_index==True].index\n",
      "# slice it\n",
      "highly_correlated_df.loc[correlated_var_names,correlated_var_names]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>src_bytes</th>\n",
        "      <th>dst_bytes</th>\n",
        "      <th>hot</th>\n",
        "      <th>logged_in</th>\n",
        "      <th>num_compromised</th>\n",
        "      <th>num_outbound_cmds</th>\n",
        "      <th>is_hot_login</th>\n",
        "      <th>count</th>\n",
        "      <th>srv_count</th>\n",
        "      <th>serror_rate</th>\n",
        "      <th>srv_serror_rate</th>\n",
        "      <th>rerror_rate</th>\n",
        "      <th>srv_rerror_rate</th>\n",
        "      <th>same_srv_rate</th>\n",
        "      <th>diff_srv_rate</th>\n",
        "      <th>dst_host_count</th>\n",
        "      <th>dst_host_srv_count</th>\n",
        "      <th>dst_host_same_srv_rate</th>\n",
        "      <th>dst_host_diff_srv_rate</th>\n",
        "      <th>dst_host_same_src_port_rate</th>\n",
        "      <th>dst_host_srv_diff_host_rate</th>\n",
        "      <th>dst_host_serror_rate</th>\n",
        "      <th>dst_host_srv_serror_rate</th>\n",
        "      <th>dst_host_rerror_rate</th>\n",
        "      <th>dst_host_srv_rerror_rate</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>src_bytes</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_bytes</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>hot</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>logged_in</th>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>num_compromised</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>num_outbound_cmds</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>is_hot_login</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>count</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>srv_count</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>serror_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>srv_serror_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>rerror_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>srv_rerror_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>same_srv_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>diff_srv_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_count</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_srv_count</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_same_srv_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_diff_srv_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_same_src_port_rate</th>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_srv_diff_host_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_serror_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_srv_serror_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_rerror_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>dst_host_srv_rerror_rate</th>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td> False</td>\n",
        "      <td>  True</td>\n",
        "      <td> False</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 22,
       "text": [
        "                            src_bytes dst_bytes    hot logged_in  \\\n",
        "src_bytes                       False     False  False     False   \n",
        "dst_bytes                       False     False  False      True   \n",
        "hot                             False     False  False     False   \n",
        "logged_in                       False      True  False     False   \n",
        "num_compromised                 False     False   True     False   \n",
        "num_outbound_cmds               False     False  False     False   \n",
        "is_hot_login                    False     False  False     False   \n",
        "count                           False     False  False     False   \n",
        "srv_count                       False     False  False     False   \n",
        "serror_rate                     False     False  False     False   \n",
        "srv_serror_rate                 False     False  False     False   \n",
        "rerror_rate                     False     False  False     False   \n",
        "srv_rerror_rate                 False     False  False     False   \n",
        "same_srv_rate                   False     False  False     False   \n",
        "diff_srv_rate                   False     False  False     False   \n",
        "dst_host_count                  False     False  False     False   \n",
        "dst_host_srv_count              False     False  False     False   \n",
        "dst_host_same_srv_rate          False     False  False     False   \n",
        "dst_host_diff_srv_rate          False     False  False     False   \n",
        "dst_host_same_src_port_rate      True     False  False     False   \n",
        "dst_host_srv_diff_host_rate     False     False  False     False   \n",
        "dst_host_serror_rate            False     False  False     False   \n",
        "dst_host_srv_serror_rate        False     False  False     False   \n",
        "dst_host_rerror_rate            False     False  False     False   \n",
        "dst_host_srv_rerror_rate        False     False  False     False   \n",
        "\n",
        "                            num_compromised num_outbound_cmds is_hot_login  \\\n",
        "src_bytes                             False             False        False   \n",
        "dst_bytes                             False             False        False   \n",
        "hot                                    True             False        False   \n",
        "logged_in                             False             False        False   \n",
        "num_compromised                       False             False        False   \n",
        "num_outbound_cmds                     False             False         True   \n",
        "is_hot_login                          False              True        False   \n",
        "count                                 False             False        False   \n",
        "srv_count                             False             False        False   \n",
        "serror_rate                           False             False        False   \n",
        "srv_serror_rate                       False             False        False   \n",
        "rerror_rate                           False             False        False   \n",
        "srv_rerror_rate                       False             False        False   \n",
        "same_srv_rate                         False             False        False   \n",
        "diff_srv_rate                         False             False        False   \n",
        "dst_host_count                        False             False        False   \n",
        "dst_host_srv_count                    False             False        False   \n",
        "dst_host_same_srv_rate                False             False        False   \n",
        "dst_host_diff_srv_rate                False             False        False   \n",
        "dst_host_same_src_port_rate           False             False        False   \n",
        "dst_host_srv_diff_host_rate           False             False        False   \n",
        "dst_host_serror_rate                  False             False        False   \n",
        "dst_host_srv_serror_rate              False             False        False   \n",
        "dst_host_rerror_rate                  False             False        False   \n",
        "dst_host_srv_rerror_rate              False             False        False   \n",
        "\n",
        "                             count srv_count serror_rate srv_serror_rate  \\\n",
        "src_bytes                    False     False       False           False   \n",
        "dst_bytes                    False     False       False           False   \n",
        "hot                          False     False       False           False   \n",
        "logged_in                    False     False       False           False   \n",
        "num_compromised              False     False       False           False   \n",
        "num_outbound_cmds            False     False       False           False   \n",
        "is_hot_login                 False     False       False           False   \n",
        "count                        False      True       False           False   \n",
        "srv_count                     True     False       False           False   \n",
        "serror_rate                  False     False       False            True   \n",
        "srv_serror_rate              False     False        True           False   \n",
        "rerror_rate                  False     False       False           False   \n",
        "srv_rerror_rate              False     False       False           False   \n",
        "same_srv_rate                False     False        True            True   \n",
        "diff_srv_rate                False     False        True            True   \n",
        "dst_host_count               False     False       False           False   \n",
        "dst_host_srv_count           False     False       False           False   \n",
        "dst_host_same_srv_rate       False     False       False           False   \n",
        "dst_host_diff_srv_rate       False     False       False           False   \n",
        "dst_host_same_src_port_rate  False      True       False           False   \n",
        "dst_host_srv_diff_host_rate  False     False       False           False   \n",
        "dst_host_serror_rate         False     False        True            True   \n",
        "dst_host_srv_serror_rate     False     False        True            True   \n",
        "dst_host_rerror_rate         False     False       False           False   \n",
        "dst_host_srv_rerror_rate     False     False       False           False   \n",
        "\n",
        "                            rerror_rate srv_rerror_rate same_srv_rate  \\\n",
        "src_bytes                         False           False         False   \n",
        "dst_bytes                         False           False         False   \n",
        "hot                               False           False         False   \n",
        "logged_in                         False           False         False   \n",
        "num_compromised                   False           False         False   \n",
        "num_outbound_cmds                 False           False         False   \n",
        "is_hot_login                      False           False         False   \n",
        "count                             False           False         False   \n",
        "srv_count                         False           False         False   \n",
        "serror_rate                       False           False          True   \n",
        "srv_serror_rate                   False           False          True   \n",
        "rerror_rate                       False            True         False   \n",
        "srv_rerror_rate                    True           False         False   \n",
        "same_srv_rate                     False           False         False   \n",
        "diff_srv_rate                     False           False          True   \n",
        "dst_host_count                    False           False         False   \n",
        "dst_host_srv_count                False           False          True   \n",
        "dst_host_same_srv_rate            False           False          True   \n",
        "dst_host_diff_srv_rate            False           False          True   \n",
        "dst_host_same_src_port_rate       False           False         False   \n",
        "dst_host_srv_diff_host_rate       False           False         False   \n",
        "dst_host_serror_rate              False           False          True   \n",
        "dst_host_srv_serror_rate          False           False          True   \n",
        "dst_host_rerror_rate               True            True         False   \n",
        "dst_host_srv_rerror_rate           True            True         False   \n",
        "\n",
        "                            diff_srv_rate dst_host_count dst_host_srv_count  \\\n",
        "src_bytes                           False          False              False   \n",
        "dst_bytes                           False          False              False   \n",
        "hot                                 False          False              False   \n",
        "logged_in                           False          False              False   \n",
        "num_compromised                     False          False              False   \n",
        "num_outbound_cmds                   False          False              False   \n",
        "is_hot_login                        False          False              False   \n",
        "count                               False          False              False   \n",
        "srv_count                           False          False              False   \n",
        "serror_rate                          True          False              False   \n",
        "srv_serror_rate                      True          False              False   \n",
        "rerror_rate                         False          False              False   \n",
        "srv_rerror_rate                     False          False              False   \n",
        "same_srv_rate                        True          False               True   \n",
        "diff_srv_rate                       False          False               True   \n",
        "dst_host_count                      False          False              False   \n",
        "dst_host_srv_count                   True          False              False   \n",
        "dst_host_same_srv_rate               True          False               True   \n",
        "dst_host_diff_srv_rate               True          False               True   \n",
        "dst_host_same_src_port_rate         False          False              False   \n",
        "dst_host_srv_diff_host_rate         False           True              False   \n",
        "dst_host_serror_rate                 True          False              False   \n",
        "dst_host_srv_serror_rate            False          False              False   \n",
        "dst_host_rerror_rate                False          False              False   \n",
        "dst_host_srv_rerror_rate            False          False              False   \n",
        "\n",
        "                            dst_host_same_srv_rate dst_host_diff_srv_rate  \\\n",
        "src_bytes                                    False                  False   \n",
        "dst_bytes                                    False                  False   \n",
        "hot                                          False                  False   \n",
        "logged_in                                    False                  False   \n",
        "num_compromised                              False                  False   \n",
        "num_outbound_cmds                            False                  False   \n",
        "is_hot_login                                 False                  False   \n",
        "count                                        False                  False   \n",
        "srv_count                                    False                  False   \n",
        "serror_rate                                  False                  False   \n",
        "srv_serror_rate                              False                  False   \n",
        "rerror_rate                                  False                  False   \n",
        "srv_rerror_rate                              False                  False   \n",
        "same_srv_rate                                 True                   True   \n",
        "diff_srv_rate                                 True                   True   \n",
        "dst_host_count                               False                  False   \n",
        "dst_host_srv_count                            True                   True   \n",
        "dst_host_same_srv_rate                       False                   True   \n",
        "dst_host_diff_srv_rate                        True                  False   \n",
        "dst_host_same_src_port_rate                  False                  False   \n",
        "dst_host_srv_diff_host_rate                  False                  False   \n",
        "dst_host_serror_rate                         False                  False   \n",
        "dst_host_srv_serror_rate                     False                  False   \n",
        "dst_host_rerror_rate                         False                  False   \n",
        "dst_host_srv_rerror_rate                     False                  False   \n",
        "\n",
        "                            dst_host_same_src_port_rate  \\\n",
        "src_bytes                                          True   \n",
        "dst_bytes                                         False   \n",
        "hot                                               False   \n",
        "logged_in                                         False   \n",
        "num_compromised                                   False   \n",
        "num_outbound_cmds                                 False   \n",
        "is_hot_login                                      False   \n",
        "count                                             False   \n",
        "srv_count                                          True   \n",
        "serror_rate                                       False   \n",
        "srv_serror_rate                                   False   \n",
        "rerror_rate                                       False   \n",
        "srv_rerror_rate                                   False   \n",
        "same_srv_rate                                     False   \n",
        "diff_srv_rate                                     False   \n",
        "dst_host_count                                    False   \n",
        "dst_host_srv_count                                False   \n",
        "dst_host_same_srv_rate                            False   \n",
        "dst_host_diff_srv_rate                            False   \n",
        "dst_host_same_src_port_rate                       False   \n",
        "dst_host_srv_diff_host_rate                       False   \n",
        "dst_host_serror_rate                              False   \n",
        "dst_host_srv_serror_rate                          False   \n",
        "dst_host_rerror_rate                              False   \n",
        "dst_host_srv_rerror_rate                          False   \n",
        "\n",
        "                            dst_host_srv_diff_host_rate dst_host_serror_rate  \\\n",
        "src_bytes                                         False                False   \n",
        "dst_bytes                                         False                False   \n",
        "hot                                               False                False   \n",
        "logged_in                                         False                False   \n",
        "num_compromised                                   False                False   \n",
        "num_outbound_cmds                                 False                False   \n",
        "is_hot_login                                      False                False   \n",
        "count                                             False                False   \n",
        "srv_count                                         False                False   \n",
        "serror_rate                                       False                 True   \n",
        "srv_serror_rate                                   False                 True   \n",
        "rerror_rate                                       False                False   \n",
        "srv_rerror_rate                                   False                False   \n",
        "same_srv_rate                                     False                 True   \n",
        "diff_srv_rate                                     False                 True   \n",
        "dst_host_count                                     True                False   \n",
        "dst_host_srv_count                                False                False   \n",
        "dst_host_same_srv_rate                            False                False   \n",
        "dst_host_diff_srv_rate                            False                False   \n",
        "dst_host_same_src_port_rate                       False                False   \n",
        "dst_host_srv_diff_host_rate                       False                False   \n",
        "dst_host_serror_rate                              False                False   \n",
        "dst_host_srv_serror_rate                          False                 True   \n",
        "dst_host_rerror_rate                              False                False   \n",
        "dst_host_srv_rerror_rate                          False                False   \n",
        "\n",
        "                            dst_host_srv_serror_rate dst_host_rerror_rate  \\\n",
        "src_bytes                                      False                False   \n",
        "dst_bytes                                      False                False   \n",
        "hot                                            False                False   \n",
        "logged_in                                      False                False   \n",
        "num_compromised                                False                False   \n",
        "num_outbound_cmds                              False                False   \n",
        "is_hot_login                                   False                False   \n",
        "count                                          False                False   \n",
        "srv_count                                      False                False   \n",
        "serror_rate                                     True                False   \n",
        "srv_serror_rate                                 True                False   \n",
        "rerror_rate                                    False                 True   \n",
        "srv_rerror_rate                                False                 True   \n",
        "same_srv_rate                                   True                False   \n",
        "diff_srv_rate                                  False                False   \n",
        "dst_host_count                                 False                False   \n",
        "dst_host_srv_count                             False                False   \n",
        "dst_host_same_srv_rate                         False                False   \n",
        "dst_host_diff_srv_rate                         False                False   \n",
        "dst_host_same_src_port_rate                    False                False   \n",
        "dst_host_srv_diff_host_rate                    False                False   \n",
        "dst_host_serror_rate                            True                False   \n",
        "dst_host_srv_serror_rate                       False                False   \n",
        "dst_host_rerror_rate                           False                False   \n",
        "dst_host_srv_rerror_rate                       False                 True   \n",
        "\n",
        "                            dst_host_srv_rerror_rate  \n",
        "src_bytes                                      False  \n",
        "dst_bytes                                      False  \n",
        "hot                                            False  \n",
        "logged_in                                      False  \n",
        "num_compromised                                False  \n",
        "num_outbound_cmds                              False  \n",
        "is_hot_login                                   False  \n",
        "count                                          False  \n",
        "srv_count                                      False  \n",
        "serror_rate                                    False  \n",
        "srv_serror_rate                                False  \n",
        "rerror_rate                                     True  \n",
        "srv_rerror_rate                                 True  \n",
        "same_srv_rate                                  False  \n",
        "diff_srv_rate                                  False  \n",
        "dst_host_count                                 False  \n",
        "dst_host_srv_count                             False  \n",
        "dst_host_same_srv_rate                         False  \n",
        "dst_host_diff_srv_rate                         False  \n",
        "dst_host_same_src_port_rate                    False  \n",
        "dst_host_srv_diff_host_rate                    False  \n",
        "dst_host_serror_rate                           False  \n",
        "dst_host_srv_serror_rate                       False  \n",
        "dst_host_rerror_rate                            True  \n",
        "dst_host_srv_rerror_rate                       False  "
       ]
      }
     ],
     "prompt_number": 22
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Conclusions and posible model selection hints"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The previous dataframe showed us which variables are highly correlated. We have kept just those variables with at least one strong correlation. We can use as we please, but a good way could be to do some model selection. That is, if we have a group of variables that are highly correlated, we can keep just one of them to represent the group under the assumption that they convey similar information as predictors. Reducing the number of variables will not improve our model accuracy, but it will make it easier to understand and also more efficient to compute.  "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For example, from the description of the [KDD Cup 99 task](http://kdd.ics.uci.edu/databases/kddcup99/task.html) we know that the variable `dst_host_same_src_port_rate` references the percentage of the last 100 connections to the same port, for the same destination host. In our correlation matrix (and auxiliar dataframes) we find that this one is highly and positively correlated to `src_bytes` and `srv_count`. The former is the number of bytes sent form source to destination. The later is the number of connections to the same service as the current connection in the past 2 seconds. We might decide not to include `dst_host_same_src_port_rate` in our model if we include the other two, as a way to reduce the number of variables and later one better interpret our models.  "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Later on, in those notebooks dedicated to build predictive models, we will make use of this information to build more interpretable models.   "
     ]
    }
   ],
   "metadata": {}
  }
 ]
}